digger/CLAUDE.md
Randa 5500c111d9 chore: add sprint skill + self-review gate to the dev loop
- CLAUDE.md dev loop: after opening the PR, wait for CI green, then
  self-review the diff (/code-review) before saying it's ready; never
  claim ready before CI-green + self-review; still never self-merge.
- .claude/skills/sprint: thin per-project skill so "start sprint N" /
  "groom sprint N" / "continue the sprint" deterministically runs the
  groom -> per-issue dev-loop -> retro flow, enforcing the guardrails
  (plan is source of truth, groom-first, CI-green + self-review before
  ready, human merges). Composes with the superpowers skills.
- docs/sprints/README.md: fold the CI-green + self-review gate into the
  per-sprint Definition of Done.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-01 17:19:35 +04:00

2.9 KiB

File-Ingestion Search Pipeline

Authoritative spec — read before planning or any architectural decision: @docs/digger-brief.md

Always

  • Use the superpowers brainstorm → plan → execute workflow and TDD (red → green → refactor).
  • Do NOT build features until I approve the plan. Create the Forgejo V1 milestone + issues only AFTER the design is approved.
  • The v1 plan (docs/superpowers/plans/2026-07-01-digger-v1-plan.md) is the source of truth: whenever a Forgejo issue is added, changed, or removed, update the plan to match (and vice versa) — they must not drift.
  • Groom each sprint before starting it into docs/sprints/sprint-<n>-<slug>.md, built from the plan + current repo state + the previous sprint's retro (see docs/sprints/README.md). Groom only the sprint about to start, not all of them. If grooming changes scope, flow it back: groom doc → plan doc → Forgejo issues. Close each sprint with a short retro in its groom doc; it feeds the next sprint's grooming.
  • Every branch gets its own git worktree at ../<repo>.worktrees/<issue>-<slug> (sibling to the repo), branched as <type>/<issue>-<slug> (feat|fix|chore|docs|refactor|test) off the latest main. Never work on main. Remove the worktree and delete the branch after merge.
  • Dev loop: pick a Forgejo issue (via the Forgejo MCP) → worktree + branch → tests-first → PR (Closes #N) → wait for CI to go green → self-review my own diff (via /code-review or superpowers:requesting-code-review; fix blocking findings — if I push fixes, re-confirm CI is green) → only then say it's readyI approve and merge. Never self-merge. Never claim a PR is ready before CI is green and the self-review pass is done.
  • Prefer small, functional PRs. I review PRs myself, so keep each one small, self-contained, and independently green — that makes review much easier. If an issue/task is large, split it into multiple PRs (one issue may span several); each PR should build and pass on its own.
  • Every sprint ships a working end-to-end slice.

Invariants (never compromise)

  • Pipeline runs standalone without Meilisearch; the search backend is swappable behind an interface.
  • Every heavy dependency is swappable behind an interface: the search engine (Sink/SearchProvider), the model runtime (ModelBackend), and the extraction engine including Docling (Extractor). DoclingDocument never escapes the extractor; the IR is the sole contract.
  • The intermediate representation (IR) is the contract — keep it stable and engine-agnostic.
  • All model inference (OCR / ASR / embeddings) is local; no file content leaves the machine.
  • v1 = keyword search only; design for vector/hybrid but keep it switched off.

Layout

  • docs/digger-brief.md — full spec
  • docs/research/ — subagent findings
  • docs/decisions/ — ADRs, IR schema, Meilisearch settings