Five-issue post-mortem of wiring up the Android APK workflow on Forgejo 10 + act_runner. Closed on run #176 by switching ROOT_URL to forgejo.localhost.
Forgejo 10 emits ROOT_URL to job containers as the runtime URL; localhost:3000 is unreachable from inside the job network namespace. Inject ACTIONS_RUNTIME_URL=http://forgejo:3000/ via runner.envs.
Job containers got ROOT_URL (host-facing localhost) as ACTIONS_RUNTIME_URL, which is unreachable from inside their network namespace. Use LOCAL_ROOT_URL=http://forgejo:3000/ for in-cluster traffic.
Hotfix for PR #1. yes | sdkmanager --licenses was failing with 141 under pipefail. Wrap yes in a brace group so its SIGPIPE exit does not fail the RUN.
Adds opt-in Android job image so workflows like exifcleaner-web's APK build skip the ~3-5 min cold Android SDK install per run.
- runner-image-android/Dockerfile inherits forgejo-stack/job:latest, adds JDK 17 Temurin + Android cmdline-tools + platforms;android-35 + build-tools;35.0.0
- setup.sh builds forgejo-stack/job-android:latest after the main image (gated on RUNNER_BUILD_ANDROID_IMAGE, default true)
- .env.example documents the new vars; .gitignore adds .worktrees/
Consumers opt in via container: forgejo-stack/job-android:latest at the job level. Default forgejo-stack/job:latest stays slim.
Drops the manual refresh dance. A new `runner-refresh` compose service
polls Forgejo every RUNNER_REFRESH_INTERVAL seconds (default 300),
fetches yarn.lock + package.json from each repo in
RUNNER_CACHE_SEED_REPOS, hashes them, and rebuilds
forgejo-stack/job:latest whenever the hash changes. `docker create`
on the rebuilt tag is automatic — forgejo-runner uses forcePull=false
so subsequent job containers pick up the refreshed image without a
runner restart.
- scripts/refresh-runner-image.sh: idempotent; hash-compares against
cache-seed/.last-fetched.sha256 to skip rebuilds when nothing
changed. Uses `sed` (not `grep -oP`) so it works under busybox
inside docker:cli's alpine base.
- docker-compose.yml: adds the `runner-refresh` service (docker:cli
+ docker.sock + project bind-mount + bash/curl install). Idles
via `sleep infinity` when RUNNER_CACHE_SEED_REPOS is unset, so
the service is safe to leave running on stacks that don't pre-warm.
- setup.sh: one-time prime after Forgejo is healthy so fresh installs
bake the cache before the runner takes its first job. Subsequent
refreshes are driven by the service.
- .env.example: documents RUNNER_CACHE_SEED_REPOS and
RUNNER_REFRESH_INTERVAL.
Verified end-to-end: pushed a yarn.lock mutation → refresh tick
detected diff → rebuilt image in ~25s → second tick reported
"lockfiles unchanged; image current".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The shared toolcache from 0df4a84 was mounted on /opt/acttoolcache,
but actions/setup-* writes new tools to AGENT_TOOLSDIRECTORY
(=/opt/hostedtoolcache in catthehacker images), so the cache never
persisted across jobs. Combined with no Node 22 baked in and yarn
cache evictions on the runner artifactcache, an e2e job that used
to finish in ~3min ballooned to 11+min when actions/cache restore
silently failed and yarn install fell back to the npm registry over
the host's WireGuard tunnel.
- docker-compose.yml + .env.example: rename volume
forgejo-stack-acttoolcache → forgejo-stack-hostedtoolcache, mount
at /opt/hostedtoolcache so setup-node downloads actually persist.
Update valid_volumes allow-list.
- runner-image/Dockerfile: bake Node 22.22.3 at
/opt/hostedtoolcache/node/22.22.3/x64 with the .complete sentinel
so setup-node@v4 finds it locally and skips the ~30s nodejs.org
download per job. Set YARN_CACHE_FOLDER=/opt/forgejo-stack/yarn-cache
and pre-warm it via an opt-in cache-seed/ directory — drop the
consumer repo's yarn.lock + package.json before build and the
entire yarn cache for that lockfile ships in the image, immune to
artifactcache flakiness.
- setup.sh: seed the hostedtoolcache volume explicitly from the
built job image. Docker only auto-initialises a named volume from
the *first* container that mounts it, and runner-1 (forgejo/runner:6)
has no baked toolcache — without explicit seeding, the empty
volume would shadow Node 22 in every job container.
Verified: post-fix Run #97 quick job is 24s (vs ~37-1m4s pre-fix);
yarn install drops from 678s back to 2.69s; setup-node logs
"Found in cache @ /opt/hostedtoolcache/node/22.22.3/x64" instead
of "Acquiring 22.22.3 - x64 from github.com/...".
Existing installations: `docker volume rm forgejo-stack-acttoolcache`
to remove the now-orphaned old volume after pulling.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lets Forgejo Actions runners survive on tunneled hosts and stop
re-downloading Node/Go/Python every job.
- runner-image/Dockerfile: catthehacker/runner-22.04 + yarn classic +
Playwright apt deps, built locally as forgejo-stack/job:latest.
Default `ubuntu-latest` jobs now resolve to this image, so workflows
using `actions/setup-node@v4` with `cache: 'yarn'` work out of the
box.
- docker-compose.yml: runner entrypoint generates /data/config.yaml
that pins job containers to the compose network (so
actions/checkout can reach `forgejo:3000`), overrides labels +
capacity without re-registration, and mounts a shared
acttoolcache volume. forgejo-runner's `valid_volumes: []` default
silently drops `-v` mounts with `is not a valid volume, will be
ignored` — the generated config now allow-lists the cache volume
so `actions/setup-*` actually shares downloads across jobs.
- INTERNAL_NETWORK_MTU plumbs the bridge MTU. Default stays 1500
for bare-metal LAN; lower it (1326-1420) when the host's default
route is a WireGuard/OpenVPN tunnel — otherwise tarball downloads
inside jobs silently hang at "Fetching packages".
- setup.sh builds the job image before `docker compose up` if it's
not already present locally.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wraps Forgejo 10's built-in migration API to import GitHub repos
into the local stack. Supports single-repo, bulk by owner, and a
--force replace mode for idempotent re-runs.
Plan defects fixed inline:
- use repo_owner (string) instead of the deprecated uid integer
- handle -h/--help before token/env checks so help works on a fresh
stack (the plan's `sed -n '2,11p'` happened after load_env)
- replace fragile sed-based usage with an awk comment-block extractor
that survives edits to the header
- fix --include-private URL: previous template appended `?per_page=`
to a base that already contained `?affiliation=...`, producing a
malformed query string
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three Forgejo 10 API behaviors that diverged from the plan's
assumptions, all caught and worked around in the live code:
- Workflows in <owner>/.forgejo do NOT propagate to other repos →
added per-repo opt-in via scripts/bootstrap-repo.sh.
- Contents API: POST creates, PUT updates (plan used PUT for both,
triggers 422 [SHA]: Required for new files).
- Secrets API: empty data is rejected with 422 → DELETE the secret
for missing keys instead of "set to empty".
- Variables: POST creates, PUT updates; not interchangeable.
- No GET on /user/actions/secrets (only PUT/DELETE).
Discovered while smoke-testing Phase 4: Forgejo 10 does NOT propagate
workflows from <owner>/.forgejo to other repos under the same owner
(verified by opening a PR in a fresh repo with no local .forgejo/ dir
— zero actions tasks fired despite forgejo_admin/.forgejo containing
both review workflows). This is a divergence from the plan's
assumption and from GitHub's behavior with .github default repos.
Mitigation:
- scripts/bootstrap-repo.sh: idempotent per-repo sync that fetches the
current claude/gemini workflow files from forgejo_admin/.forgejo and
pushes them into <target>/.forgejo/workflows/. Falls back to local
templates/ if the org repo is unreachable.
- smoke-test.sh: e2e PR check now invokes bootstrap-repo.sh on the
temp repo before opening the PR so the bots can actually fire.
- Memory file: documents the per-repo opt-in step so future Claude
Code sessions in ~/space/ know to run bootstrap-repo.sh on each
repo before expecting bot reviews.
If/when upstream Forgejo adds org-default workflow propagation,
bootstrap-repo.sh becomes optional but harmless to keep around.
The new section creates a temp repo, opens a PR, polls for bot
comments for up to 240s, and PASSes only when BOTH claude-bot and
gemini-bot have commented. The temp repo is deleted in a RETURN
trap so failures don't leak state.
When ANTHROPIC_API_KEY/GEMINI_API_KEY are not set in .env, the
check WARNs and returns 0 (still counted as PASS) — the bots can't
function without the LLM keys, so failing this check would just be
noise. Adding the keys and re-running setup.sh + smoke-test.sh
exercises the live path without code changes.
ensure_secrets uses Forgejo 10's user-scoped endpoints:
/user/actions/secrets/<name> PUT (create-or-update)
/user/actions/variables/<name> POST (create), PUT (update)
User-scoped means: every workflow run in any repo owned by the admin
user inherits these secrets/vars — exactly the "org-default" effect we
want for a single-user instance.
Plan defects fixed:
- Plan used PUT for variables, but Forgejo 10 distinguishes POST (create)
vs PUT (update); set_user_var now PUTs first, falls back to POST on 404.
- Plan tried to PUT empty values for missing API keys, but Forgejo rejects
empty `data` with 422 [Data]: Required. set_user_secret now DELETEs the
secret when the key is empty (so workflows see ${{ secrets.X }} expand
to the empty string and short-circuit cleanly), warning the user how
to enable the provider.
ensure_org_workflows_repo creates forgejo_admin/.forgejo (auto_init,
private, default_branch=main) on first run, then syncs four files:
workflows/{claude,gemini}-review.yml and prompts/{claude,gemini}-review.md.
Plan defect fixed: the plan used PUT for both create and update via the
contents API. On Forgejo 10, PUT requires `sha` (update only) and POST
creates new files; sending PUT with no sha returns 422 "[SHA]: Required".
push_file now probes the existing file in one round-trip and chooses
POST (no sha) vs PUT (with sha), and short-circuits when the on-server
content already matches the local b64 — so re-running setup.sh produces
no spurious commits in the .forgejo repo.
ensure_bot_user is idempotent: skips create when /users/<name>
returns 200; skips token-mint when the env var is already set.
Tokens are minted with [write:repository, write:issue] which we
verified Forgejo 10 accepts; without those scopes the bot can't
post PR comments. Token creation requires basic auth (admin tokens
can't mint tokens for other users), so we curl with -u directly.
claude-review.md targets correctness/security/clarity/tests; gemini's
prompt targets architecture/perf/maintainability so the two bots have
complementary lenses rather than redundant ones. Per-repo overrides
under .forgejo/prompts/ take precedence at workflow time.
These workflows fire on every PR (open/sync) for any repo under the
admin user, sourced from the org-level .forgejo repo. They:
- gate on vars.{CLAUDE,GEMINI}_BOT_ENABLED + skip-bot-review label
- prefer per-repo .forgejo/prompts/{name}-review.md, falling back to
the org-level prompt fetched from .forgejo/prompts/
- compute diff against base.sha and send "PROMPT + diff" to the LLM
- post the result as a PR comment via the bot's scoped token
- skip cleanly (without erroring) when the LLM API key is unset
- forgejo-mcp image is ronmi/forgejo-mcp (Docker Hub), not ghcr.io
- env vars are FORGEJOMCP_SERVER / FORGEJOMCP_TOKEN (no separator)
- image is FROM scratch — no shell, no sleep, no exec; runs http mode
for liveness, Claude Code uses ephemeral docker run for stdio per session
- POST /users/{user}/tokens requires basic auth, not token auth
- smoke test container check uses docker inspect
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds three new checks to smoke-test.sh (total: 7 PASS):
- forgejo-mcp container is running (state inspect, since the FROM-scratch
image has no shell for 'docker compose exec' to attach to)
- forgejo MCP server registered in ~/space/.mcp.json
- Claude Code reference memory file exists
The plan's check_mcp_container did 'docker compose exec ... /bin/sh' but
ronmi/forgejo-mcp ships no shell — replaced with a docker inspect on the
container's State.Status.
Adds three idempotent functions to bootstrap:
- ensure_mcp_token: creates a Forgejo access token for MCP, scoped 'all'.
Uses HTTP basic auth (NOT token auth) because Forgejo restricts
/users/{user}/tokens to basic-auth callers (tokens cannot mint tokens).
- ensure_mcp_registered: writes ~/space/.mcp.json with a stdio launcher that
spawns 'docker run --rm -i ... ronmi/forgejo-mcp:latest stdio' per session.
The launcher joins forgejo-stack_internal so it can reach Forgejo at
http://forgejo:3000.
- ensure_memory_written: writes ~/.claude/projects/-home-luffy-space/memory/
forgejo-local.md plus an idempotent index entry in MEMORY.md.
Also adds FORGEJO_MCP_HTTP_PORT=8181 to .env.example for the long-running
HTTP MCP endpoint exposed by the compose sidecar.
Verified: a fresh stdio invocation completes the MCP 'initialize' handshake
and reports tools.listChanged capability.
The ronmi/forgejo-mcp image is FROM scratch (no shell, no utilities — just
/forgejo-mcp). The plan's 'sleep infinity' entrypoint is therefore impossible,
so we run the binary in http mode instead: long-lived, smoke-testable via curl,
exposed on FORGEJO_MCP_HTTP_PORT (default 8181). Claude Code itself uses stdio
mode via a separate 'docker run --rm -i ... stdio' invocation registered in
.mcp.json by bootstrap-forgejo.sh.
Note env var names are FORGEJOMCP_SERVER / FORGEJOMCP_TOKEN (single underscore),
not FORGEJO_URL / FORGEJO_TOKEN as the plan suggested.
- Forgejo 10 has no /admin/runners list endpoint; use data/runner/.runner
state file as source of truth instead
- runner container needs group_add for docker.sock access (uid 1000 vs
root:docker 660)
- compose entrypoint shell vars need $$ to bypass interpolation
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Plan defect noted: plan called for GET /api/v1/admin/runners to count
registered runners by name, but Forgejo 10's API has no runner-list
endpoint. Switched to a local check: data/runner/.runner is non-empty
(written by forgejo-runner on successful register) and the runner
container is in 'running' state.
Patches two plan defects discovered during implementation:
1. Plan called for GET /api/v1/admin/runners to detect existing
registration, but Forgejo 10.0.3's API only exposes
/admin/runners/registration-token (no list endpoint). All such GETs
404. Switched the idempotency check to inspect the runner's local
state file data/runner/.runner, which forgejo-runner writes on
successful register and refuses to overwrite.
2. The runner image runs as uid 1000:1000 and could not access
/var/run/docker.sock (root:docker 660). Added group_add with the
host's docker group GID (default 984, overridable via DOCKER_GID)
so the container's user gets supplementary access to the socket.
- FORGEJO_ADMIN_USER cannot be 'admin' (Forgejo reserves it)
- START_SSH_SERVER must be false (image already runs openssh)
- fill_if_empty must strip inline comments from .env.example values
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Use forgejo_admin instead of admin (admin is reserved in Forgejo)
- Disable Forgejo built-in SSH server; container's openssh-server
already binds :22 inside the container
- Make fill_if_empty in setup.sh treat 'KEY= # comment' lines in
.env.example as empty so secrets are actually generated
26 tasks across six phases (compose stack, Actions runner, MCP/memory,
bot review workflows, GitHub migration, README/Caddy/restore mode).
Each phase ends with the smoke test green; smoke test grows alongside
implementation as the integration test of record.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Covers a portable Docker Compose stack (Forgejo + Postgres + runner +
forgejo-mcp), org-level Claude/Gemini review workflows triggered on PR
open and push, MCP-first Claude Code integration with deferred skill,
three migration scenarios, GitHub-to-Forgejo migration helper, and a
smoke test that doubles as living documentation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>