digger/docs/PLANNING_KICKOFF.md
Randa 016f17e121 docs(plan): make Windows CI runner wiring in-scope Sprint-0 issues
The Win11 KVM VM is provisioned and ready, so the runner wiring is no longer
a human prerequisite. Turn it into three concrete, pickup-able Sprint-0 infra
issues (register the Crown0815 runner with a windows label; vendor the host
toolchain; wire the runs-on: windows CI job), each with acceptance criteria.
Also refresh the status note now that the design PR is merged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-01 16:39:51 +04:00

6.2 KiB
Raw Permalink Blame History

Planning-session kickoff

The design phase is complete and approved. This file is the prompt to start a fresh session for the planning phase (turn the approved design into a sprint-based v1 plan and create the Forgejo milestones).

Status: the design PR is merged — the approved docs are on main, so a fresh session reads them directly.

Where the decisions live (the planning session must treat these as the source of truth):


The prompt (copy-paste into a new session)

We're at the planning gate for "digger" — a local-first, cross-platform file-ingestion
search pipeline. The design phase is complete and approved; research + design are in the repo.

READ THESE FIRST (they are the source of truth — do not re-derive decisions):
- README.md — orientation + the swappability invariant
- docs/decisions/README.md, then ADRs 00010010, plus the two contract artifacts
  docs/decisions/ir-schema.json and docs/decisions/meilisearch-settings.json
- docs/research/SYNTHESIS.md — per-layer decisions, resolved trade-offs (§4), version pins (§5)
- docs/digger-brief.md — the full spec (Section 14 = confirmed parameters)
- CLAUDE.md — the working agreement (worktrees, TDD, Forgejo dev loop, never self-merge)

TASK (do NOT write feature code; stop at the approval gate):
1. Use the superpowers writing-plans skill to turn the approved design into an agile,
   sprint-based v1 plan where EVERY sprint ships a working end-to-end slice. Commit the
   plan doc on its own branch/worktree and open a PR (per CLAUDE.md).
2. Create, via the Forgejo MCP on forgejo_admin/digger: a v1 milestone with every task as
   an issue (title, description, acceptance criteria, labels, assigned sprint), plus a
   COARSE v2 milestone for deferred work.
3. Present the plan + milestones to me for approval, then STOP.

BAKE INTO THE PLAN (from the ADRs):
- Sprint 0 (skeleton): scaffolding; GREEN Forgejo CI from the first commit (Linux unit +
  Meilisearch-integration tiers on the Docker runner). **The Windows CI runner wiring is
  in-scope as explicit Sprint-0 infra issues** — the Win11 KVM VM is already provisioned and
  ready, so only the Forgejo wiring remains (see "WINDOWS CI RUNNER" below). The `heavy`
  (real-model OCR/ASR) runner stays an unregistered placeholder for now. Then: the seven
  interfaces (Source/Extractor/ModelBackend/Transformer/Sink/SearchProvider/StateStore);
  FileSink (IR→disk, standalone works day one); config + SQLite StateStore; Meilisearch index
  auto-create + settings + the dormant userProvided embedder; a trivial walk →
  stub-extractor → IR → Meilisearch → search slice.
- Sprint 1 (Priority 1, scanned docs): real OCR end-to-end. Make the OCR-harness BAKE-OFF
  (Docling-VLM via ApiVlmOptions vs the thin Ollama wrapper) an explicit EARLY issue,
  benchmarked on a real Arabic corpus. Chunking transformer (docling-core HybridChunker for
  Docling-sourced records; segment packer for OCR/ASR). Docling for native-digital + OOXML
  (with the thin OfficeMetadataAugmenter) + the FastAPI/HTMX SearchProvider UI as slices.
- Later sprint (Priority 3, A/V): v1 uses Docling's built-in ASR (preset overridden to
  large-v3; segment-level timestamps).
- v2 milestone (coarse, low-detail): legacy binary Office (.doc/.xls/.ppt) + Access; the
  dedicated faster-whisper extractor (word-level timestamps, VAD, diarization); vector/hybrid
  search (finalize embedding model + dimensions before writing vectors); .msg/ZIP via
  Unstructured; watched-folder/scheduled mode.

WINDOWS CI RUNNER (Sprint-0 infra — the Win11 KVM VM is READY; only the Forgejo wiring
remains, so create these as concrete, pickup-able issues; see ADR 0010):
- Register the runner: install the community `Crown0815/Forgejo-runner-windows-builder`
  (pin v12.12.0) on the Win11 VM as `act_runner`, register it against the Forgejo instance
  with a `windows` label using an admin registration token, run it as a Windows service, and
  add a Defender/AV exception for the binary. Lands in ../forgejo-stack.
  AC: the runner shows online with the `windows` label in Forgejo admin.
- Vendor the host toolchain: a native Windows runner executes jobs on the host (not in
  containers), so install the pinned Python + git + the unit-tier build deps on the VM and
  document the host setup. Lands in ../forgejo-stack.
  AC: `python`, `git`, and the unit-tier deps resolve on the runner.
- Wire the Windows job: add a `runs-on: windows` job to digger's .forgejo/workflows running
  the Windows-only path/encoding/file-lock unit tier (host-native), and confirm one green
  run on a PR. Lands in digger.
  AC: the Windows job runs and passes on a PR to main.

HONOR THE INVARIANTS: the IR (CanonicalDocument) is the sole contract; Docling/Meilisearch/
models are swappable behind interfaces; all inference local; v1 keyword-only (vectors dormant);
fail-isolated (deferred formats → status: skipped, never a crash); CI green from commit one.
Surface any genuinely open trade-off to me instead of silently deciding.

Status of the Windows CI runner

The Windows 11 KVM VM is already provisioned and ready. The remaining work — registering the Crown0815 runner with the Forgejo instance (windows label), vendoring the host toolchain, and wiring the Windows CI job — is now planned as the Sprint-0 WINDOWS CI RUNNER issues in the prompt above (targeting ../forgejo-stack + digger's workflow), so it can be picked up like any other issue. The only human touch left is providing a Forgejo admin registration token when the register-the-runner issue is worked. See ADR 0010.