The Win11 KVM VM is provisioned and ready, so the runner wiring is no longer a human prerequisite. Turn it into three concrete, pickup-able Sprint-0 infra issues (register the Crown0815 runner with a windows label; vendor the host toolchain; wire the runs-on: windows CI job), each with acceptance criteria. Also refresh the status note now that the design PR is merged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
6.2 KiB
Planning-session kickoff
The design phase is complete and approved. This file is the prompt to start a fresh session for the planning phase (turn the approved design into a sprint-based v1 plan and create the Forgejo milestones).
Status: the design PR is merged — the approved docs are on main, so a fresh session
reads them directly.
Where the decisions live (the planning session must treat these as the source of truth):
README.md— orientation + the swappability invariantdecisions/README.md→ ADRs 0001–0010, plus the contract artifactsir-schema.jsonandmeilisearch-settings.jsonresearch/SYNTHESIS.md— per-layer decisions, resolved trade-offs (§4), version pins (§5)digger-brief.md— the full spec (Section 14 = confirmed parameters)../CLAUDE.md— the working agreement (worktrees, TDD, Forgejo dev loop, never self-merge)
The prompt (copy-paste into a new session)
We're at the planning gate for "digger" — a local-first, cross-platform file-ingestion
search pipeline. The design phase is complete and approved; research + design are in the repo.
READ THESE FIRST (they are the source of truth — do not re-derive decisions):
- README.md — orientation + the swappability invariant
- docs/decisions/README.md, then ADRs 0001–0010, plus the two contract artifacts
docs/decisions/ir-schema.json and docs/decisions/meilisearch-settings.json
- docs/research/SYNTHESIS.md — per-layer decisions, resolved trade-offs (§4), version pins (§5)
- docs/digger-brief.md — the full spec (Section 14 = confirmed parameters)
- CLAUDE.md — the working agreement (worktrees, TDD, Forgejo dev loop, never self-merge)
TASK (do NOT write feature code; stop at the approval gate):
1. Use the superpowers writing-plans skill to turn the approved design into an agile,
sprint-based v1 plan where EVERY sprint ships a working end-to-end slice. Commit the
plan doc on its own branch/worktree and open a PR (per CLAUDE.md).
2. Create, via the Forgejo MCP on forgejo_admin/digger: a v1 milestone with every task as
an issue (title, description, acceptance criteria, labels, assigned sprint), plus a
COARSE v2 milestone for deferred work.
3. Present the plan + milestones to me for approval, then STOP.
BAKE INTO THE PLAN (from the ADRs):
- Sprint 0 (skeleton): scaffolding; GREEN Forgejo CI from the first commit (Linux unit +
Meilisearch-integration tiers on the Docker runner). **The Windows CI runner wiring is
in-scope as explicit Sprint-0 infra issues** — the Win11 KVM VM is already provisioned and
ready, so only the Forgejo wiring remains (see "WINDOWS CI RUNNER" below). The `heavy`
(real-model OCR/ASR) runner stays an unregistered placeholder for now. Then: the seven
interfaces (Source/Extractor/ModelBackend/Transformer/Sink/SearchProvider/StateStore);
FileSink (IR→disk, standalone works day one); config + SQLite StateStore; Meilisearch index
auto-create + settings + the dormant userProvided embedder; a trivial walk →
stub-extractor → IR → Meilisearch → search slice.
- Sprint 1 (Priority 1, scanned docs): real OCR end-to-end. Make the OCR-harness BAKE-OFF
(Docling-VLM via ApiVlmOptions vs the thin Ollama wrapper) an explicit EARLY issue,
benchmarked on a real Arabic corpus. Chunking transformer (docling-core HybridChunker for
Docling-sourced records; segment packer for OCR/ASR). Docling for native-digital + OOXML
(with the thin OfficeMetadataAugmenter) + the FastAPI/HTMX SearchProvider UI as slices.
- Later sprint (Priority 3, A/V): v1 uses Docling's built-in ASR (preset overridden to
large-v3; segment-level timestamps).
- v2 milestone (coarse, low-detail): legacy binary Office (.doc/.xls/.ppt) + Access; the
dedicated faster-whisper extractor (word-level timestamps, VAD, diarization); vector/hybrid
search (finalize embedding model + dimensions before writing vectors); .msg/ZIP via
Unstructured; watched-folder/scheduled mode.
WINDOWS CI RUNNER (Sprint-0 infra — the Win11 KVM VM is READY; only the Forgejo wiring
remains, so create these as concrete, pickup-able issues; see ADR 0010):
- Register the runner: install the community `Crown0815/Forgejo-runner-windows-builder`
(pin v12.12.0) on the Win11 VM as `act_runner`, register it against the Forgejo instance
with a `windows` label using an admin registration token, run it as a Windows service, and
add a Defender/AV exception for the binary. Lands in ../forgejo-stack.
AC: the runner shows online with the `windows` label in Forgejo admin.
- Vendor the host toolchain: a native Windows runner executes jobs on the host (not in
containers), so install the pinned Python + git + the unit-tier build deps on the VM and
document the host setup. Lands in ../forgejo-stack.
AC: `python`, `git`, and the unit-tier deps resolve on the runner.
- Wire the Windows job: add a `runs-on: windows` job to digger's .forgejo/workflows running
the Windows-only path/encoding/file-lock unit tier (host-native), and confirm one green
run on a PR. Lands in digger.
AC: the Windows job runs and passes on a PR to main.
HONOR THE INVARIANTS: the IR (CanonicalDocument) is the sole contract; Docling/Meilisearch/
models are swappable behind interfaces; all inference local; v1 keyword-only (vectors dormant);
fail-isolated (deferred formats → status: skipped, never a crash); CI green from commit one.
Surface any genuinely open trade-off to me instead of silently deciding.
Status of the Windows CI runner
The Windows 11 KVM VM is already provisioned and ready. The remaining work — registering
the Crown0815 runner with the Forgejo instance (windows label), vendoring the host
toolchain, and wiring the Windows CI job — is now planned as the Sprint-0 WINDOWS CI RUNNER
issues in the prompt above (targeting ../forgejo-stack + digger's workflow), so it can be
picked up like any other issue. The only human touch left is providing a Forgejo admin
registration token when the register-the-runner issue is worked. See
ADR 0010.