fix(runner): shared toolcache + Node 22 + yarn pre-warm

The shared toolcache from 0df4a84 was mounted on /opt/acttoolcache, but actions/setup-* writes new tools to AGENT_TOOLSDIRECTORY (=/opt/hostedtoolcache in catthehacker images), so the cache never persisted across jobs. Combined with no Node 22 baked in and yarn cache evictions on the runner artifactcache, an e2e job that used to finish in ~3min ballooned to 11+min when actions/cache restore silently failed and yarn install fell back to the npm registry over the host's WireGuard tunnel. - docker-compose.yml + .env.example: rename volume forgejo-stack-acttoolcache → forgejo-stack-hostedtoolcache, mount at /opt/hostedtoolcache so setup-node downloads actually persist. Update valid_volumes allow-list. - runner-image/Dockerfile: bake Node 22.22.3 at /opt/hostedtoolcache/node/22.22.3/x64 with the .complete sentinel so setup-node@v4 finds it locally and skips the ~30s nodejs.org download per job. Set YARN_CACHE_FOLDER=/opt/forgejo-stack/yarn-cache and pre-warm it via an opt-in cache-seed/ directory — drop the consumer repo's yarn.lock + package.json before build and the entire yarn cache for that lockfile ships in the image, immune to artifactcache flakiness. - setup.sh: seed the hostedtoolcache volume explicitly from the built job image. Docker only auto-initialises a named volume from the *first* container that mounts it, and runner-1 (forgejo/runner:6) has no baked toolcache — without explicit seeding, the empty volume would shadow Node 22 in every job container. Verified: post-fix Run #97 quick job is 24s (vs ~37-1m4s pre-fix); yarn install drops from 678s back to 2.69s; setup-node logs "Found in cache @ /opt/hostedtoolcache/node/22.22.3/x64" instead of "Acquiring 22.22.3 - x64 from github.com/...". Existing installations: `docker volume rm forgejo-stack-acttoolcache` to remove the now-orphaned old volume after pulling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 09:07:37 +04:00 · 2026-05-16 09:07:37 +04:00 · e6f9f1f9f9
commit e6f9f1f9f9
parent 0df4a848af
5 changed files with 79 additions and 14 deletions
--- a/.env.example
+++ b/.env.example
@ -37,14 +37,15 @@ RUNNER_NAME=local-runner
 # lower if the box is shared with workstation use.
 # RUNNER_CAPACITY=6
 # Extra `docker run` args forwarded into every job container. Default mounts a
-# shared `actions/setup-*` toolcache so Node/Go/Python downloads only happen
-# once. Append more `-v ...` flags if you want to share additional caches.
-# RUNNER_JOB_OPTIONS=-v forgejo-stack-acttoolcache:/opt/acttoolcache
+# shared `actions/setup-*` toolcache (=AGENT_TOOLSDIRECTORY for catthehacker
+# images) so Node/Go/Python downloads only happen once. Append more `-v ...`
+# flags if you want to share additional caches.
+# RUNNER_JOB_OPTIONS=-v forgejo-stack-hostedtoolcache:/opt/hostedtoolcache
 # Named volumes referenced from RUNNER_JOB_OPTIONS must be allow-listed here
 # (comma-separated) — forgejo-runner silently drops unlisted mounts with the
 # warning "is not a valid volume, will be ignored". Keep in sync with the
 # volume names used above.
-# RUNNER_VALID_VOLUMES=forgejo-stack-acttoolcache
+# RUNNER_VALID_VOLUMES=forgejo-stack-hostedtoolcache

 # ---------- Internal network MTU ----------
 # Default 1500 works on bare-metal LAN. Lower this when the host's default
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -74,8 +74,10 @@ services:
      - /var/run/docker.sock:/var/run/docker.sock
      # Mounted here only so `docker compose up` materialises the named volume;
      # job containers consume it through `container.options` below (the host
-      # docker daemon sees the volume under its fixed name).
-      - acttoolcache:/opt/acttoolcache
+      # docker daemon sees the volume under its fixed name). Path must match
+      # AGENT_TOOLSDIRECTORY in catthehacker images (/opt/hostedtoolcache) so
+      # actions/setup-* find tools written by prior jobs.
+      - hostedtoolcache:/opt/hostedtoolcache
    # Add the host's docker group GID so the in-container user (uid 1000) can
    # talk to /var/run/docker.sock. Override via DOCKER_GID in .env if your
    # host's docker group is not 984.
@ -116,15 +118,18 @@ services:
        CAPACITY="$${RUNNER_CAPACITY:-6}"
        # Extra `docker run` args forwarded to every job container. Default
        # mounts a shared toolcache so `actions/setup-node`, `setup-go`, etc.
-        # hit cache on the second job instead of re-downloading. Volume name
-        # is fixed via `name:` at the bottom of this file so renaming the
-        # compose project doesn't orphan the cache.
-        JOB_OPTIONS="$${RUNNER_JOB_OPTIONS:--v forgejo-stack-acttoolcache:/opt/acttoolcache}"
+        # hit cache on the second job instead of re-downloading. Mount path
+        # must be AGENT_TOOLSDIRECTORY (=/opt/hostedtoolcache in catthehacker
+        # images); mounting on /opt/acttoolcache is a dead end because
+        # setup-* never writes there. Volume name is fixed via `name:` at the
+        # bottom of this file so renaming the compose project doesn't orphan
+        # the cache.
+        JOB_OPTIONS="$${RUNNER_JOB_OPTIONS:--v forgejo-stack-hostedtoolcache:/opt/hostedtoolcache}"
        # Named volumes referenced from JOB_OPTIONS must be allow-listed here or
        # forgejo-runner silently drops the mount with "is not a valid volume,
        # will be ignored". Defaults match the cache volume above; override with
        # RUNNER_VALID_VOLUMES (comma-separated) if you add more.
-        VALID_VOLUMES="$${RUNNER_VALID_VOLUMES:-forgejo-stack-acttoolcache}"
+        VALID_VOLUMES="$${RUNNER_VALID_VOLUMES:-forgejo-stack-hostedtoolcache}"
        cat > /data/config.yaml <<CONFIG
        runner:
          capacity: $$CAPACITY
@ -170,9 +175,10 @@ services:
 volumes:
  # Shared `actions/setup-*` toolcache across all job containers. First run
  # downloads, subsequent jobs hit cache. Fixed name keeps the volume stable
-  # if the compose project is renamed.
-  acttoolcache:
-    name: forgejo-stack-acttoolcache
+  # if the compose project is renamed. Path inside containers must match
+  # AGENT_TOOLSDIRECTORY (=/opt/hostedtoolcache for catthehacker images).
+  hostedtoolcache:
+    name: forgejo-stack-hostedtoolcache

 networks:
  internal:
--- a/runner-image/Dockerfile
+++ b/runner-image/Dockerfile
@ -24,3 +24,42 @@ RUN corepack enable \
 RUN apt-get update \
 && DEBIAN_FRONTEND=noninteractive npx --yes playwright@latest install-deps chromium webkit \
 && rm -rf /var/lib/apt/lists/*
+
+# Pre-stage Node 22 in AGENT_TOOLSDIRECTORY so actions/setup-node@v4 finds it
+# locally and skips the ~30s nodejs.org download every job. Layout matches
+# @actions/tool-cache:
+#   /opt/hostedtoolcache/node/<version>/x64/   ← extracted tarball
+#   /opt/hostedtoolcache/node/<version>/x64.complete   ← sentinel
+# Bump NODE_VERSION when upstream LTS moves.
+ARG NODE_VERSION=22.22.3
+RUN mkdir -p /opt/hostedtoolcache/node/${NODE_VERSION}/x64 \
+ && curl -fsSL "https://nodejs.org/dist/v${NODE_VERSION}/node-v${NODE_VERSION}-linux-x64.tar.xz" \
+  | tar -xJ --strip-components=1 -C /opt/hostedtoolcache/node/${NODE_VERSION}/x64 \
+ && touch /opt/hostedtoolcache/node/${NODE_VERSION}/x64.complete \
+ && chmod -R a+rX /opt/hostedtoolcache
+
+# Centralised yarn cache. setup-node@v4's `cache: 'yarn'` reads `yarn cache
+# dir`, which honors YARN_CACHE_FOLDER — so this is where both the optional
+# pre-warm below AND runtime yarn invocations read/write. Persisting it via
+# the shared toolcache volume isn't strictly needed once the image is baked,
+# but workflows that add deps still benefit from the in-image starting point.
+ENV YARN_CACHE_FOLDER=/opt/forgejo-stack/yarn-cache
+
+# Optional yarn classic offline-cache pre-warm. Drop yarn.lock + package.json
+# (and any .yarnrc your workflow relies on) into runner-image/cache-seed/
+# before building; the entire yarn cache for that lockfile gets baked into
+# the image and made world-readable. With no seed files present this step is
+# a no-op. Two COPYs because Docker requires the destination directory to
+# exist before the per-file COPY can be conditional on globbing.
+RUN mkdir -p /opt/forgejo-stack/cache-seed
+COPY cache-seed/ /opt/forgejo-stack/cache-seed/
+RUN if [ -f /opt/forgejo-stack/cache-seed/yarn.lock ] \
+    && [ -f /opt/forgejo-stack/cache-seed/package.json ]; then \
+      cd /opt/forgejo-stack/cache-seed \
+      && export PATH="/opt/hostedtoolcache/node/${NODE_VERSION}/x64/bin:$PATH" \
+      && corepack prepare yarn@1.22.22 --activate >/dev/null \
+      && yarn install --frozen-lockfile --non-interactive --ignore-scripts --no-bin-links \
+      && rm -rf node_modules \
+      && chmod -R a+rX "$YARN_CACHE_FOLDER" ; \
+    fi \
+ && rm -rf /opt/forgejo-stack/cache-seed
--- a/runner-image/cache-seed/.gitignore
+++ b/runner-image/cache-seed/.gitignore
@ -0,0 +1,5 @@
+# Don't commit the seeded lockfile/package.json — they're per-environment and
+# can be large. Build-time only: setup.sh fetches them from your Forgejo
+# instance, or drop them in by hand before `docker build`.
+*
+!.gitignore
--- a/setup.sh
+++ b/setup.sh
@ -79,6 +79,20 @@ else
  log_info "runner job image already present: $JOB_IMAGE_TAG"
 fi

+# 4a. Seed the hostedtoolcache volume from the job image's baked
+#     /opt/hostedtoolcache. Docker only auto-initialises a named volume from
+#     the FIRST container that mounts it; runner-1 (image forgejo/runner:6)
+#     has no baked toolcache, so without explicit seeding the empty volume
+#     would shadow Node 22 in every job container. Only seeds once — to
+#     refresh after a job-image rebuild, `docker volume rm` first.
+TOOLCACHE_VOLUME="${RUNNER_HOSTEDTOOLCACHE_VOLUME:-forgejo-stack-hostedtoolcache}"
+if ! docker volume inspect "$TOOLCACHE_VOLUME" >/dev/null 2>&1; then
+  log_info "seeding $TOOLCACHE_VOLUME from $JOB_IMAGE_TAG"
+  docker volume create "$TOOLCACHE_VOLUME" >/dev/null
+  docker run --rm -v "$TOOLCACHE_VOLUME":/dst "$JOB_IMAGE_TAG" \
+    sh -c 'cp -a /opt/hostedtoolcache/. /dst/ 2>/dev/null || true; chmod -R a+rwX /dst'
+fi
+
 # 5. Bring the stack up
 log_info "starting compose stack"
 docker compose up -d