docs(postmortems): android APK CI bring-up (#7)
Five-issue post-mortem of wiring up the Android APK workflow on Forgejo 10 + act_runner. Closed on run #176 by switching ROOT_URL to forgejo.localhost.
This commit is contained in:
parent
18dfc4e696
commit
a1a26e9deb
1 changed files with 205 additions and 0 deletions
205
docs/postmortems/2026-05-17-android-apk-ci.md
Normal file
205
docs/postmortems/2026-05-17-android-apk-ci.md
Normal file
|
|
@ -0,0 +1,205 @@
|
|||
# Android APK CI bring-up — post-mortem
|
||||
|
||||
**Date:** 2026-05-17
|
||||
**Repo:** `forgejo_admin/exifcleaner-web`
|
||||
**Workflow:** `.github/workflows/build-android.yml` (built in [exifcleaner-web#156](http://localhost:3000/forgejo_admin/exifcleaner-web/pulls/156))
|
||||
**Outcome:** Green on run #176 after 8 dispatch attempts and 5 forgejo-stack PRs.
|
||||
|
||||
## Quick summary
|
||||
|
||||
Wiring up the first `workflow_dispatch` Android APK build against Forgejo 10 + act_runner uncovered five distinct issues, each one masking the next. The Gradle build itself worked on the first run; the rest of the failures were CI plumbing — base image, action source resolution, JDK version, artifact API version, and finally the URL Forgejo emits to job containers.
|
||||
|
||||
| Run | Outcome | Failure |
|
||||
| --- | --- | --- |
|
||||
| 159 | failure | `android-actions/setup-android` not mirrored on `data.forgejo.org` |
|
||||
| 162 | failure | (manual setup.sh mid-iteration, ignore) |
|
||||
| 165 | failure | `actions/upload-artifact@v4` → `GHESNotSupportedError` |
|
||||
| 168 | failure | `forgejo/upload-artifact@v4` fork → `ECONNREFUSED` (Forgejo 10 has no v2 endpoint) |
|
||||
| 170 | failure | `actions/upload-artifact@v3` → `ECONNREFUSED 127.0.0.1:3000` |
|
||||
| 172 | failure | LOCAL_ROOT_URL doesn't fix the runtime URL on Forgejo 10 |
|
||||
| 174 | failure | `runner.envs.ACTIONS_RUNTIME_URL` override clobbered by act_runner auto-set |
|
||||
| **176** | **success** | `ROOT_URL=http://forgejo.localhost:3000/` resolves on both sides |
|
||||
|
||||
End-to-end debug build: **~4.5 min cold** (5m 55s first time, 4m 33s with Gradle cache warm). APK: **4.7 MB**.
|
||||
|
||||
---
|
||||
|
||||
## The five issues in order
|
||||
|
||||
### 1. Forgejo runner action mirror is incomplete
|
||||
|
||||
**Symptom (run #159):**
|
||||
|
||||
```
|
||||
git clone 'https://data.forgejo.org/android-actions/setup-android' # ref=v3
|
||||
Unable to clone https://data.forgejo.org/android-actions/setup-android refs/heads/v3: repository not found
|
||||
```
|
||||
|
||||
**Diagnosis:** Forgejo's act_runner resolves every `uses:` reference at workflow-prep time, *before* evaluating step `if:` conditions. `data.forgejo.org` (Forgejo's actions registry mirror) mirrors `actions/*` but not `android-actions/*` — and even gated steps (`if: !use_prebaked_image`) still triggered the clone, which 404'd, which failed the whole workflow.
|
||||
|
||||
**Fix:** Pin the explicit GitHub URL inline:
|
||||
|
||||
```yaml
|
||||
uses: https://github.com/android-actions/setup-android@v3
|
||||
```
|
||||
|
||||
**PR:** [exifcleaner-web#158](http://localhost:3000/forgejo_admin/exifcleaner-web/pulls/158).
|
||||
|
||||
**Generalisable lesson:** any non-`actions/*` GitHub action needs an explicit `https://github.com/...@version` URL on Forgejo. Same pattern applies to community actions in other orgs (`docker/build-push-action`, `pnpm/action-setup`, etc.).
|
||||
|
||||
---
|
||||
|
||||
### 2. Capacitor 7.6 requires JDK 21, not 17
|
||||
|
||||
**Symptom (run #160 area):**
|
||||
|
||||
```
|
||||
Execution failed for task ':capacitor-android:compileDebugJavaWithJavac'.
|
||||
> error: invalid source release: 21
|
||||
```
|
||||
|
||||
**Diagnosis:** Capacitor 7.6.x's `capacitor-android` library has `sourceCompatibility = VERSION_21` in its build.gradle. JDK 17 errors out at the javac stage. (I had baked JDK 17 into `forgejo-stack/job-android:latest` based on Capacitor's older docs saying "JDK 17 required".)
|
||||
|
||||
**Fix:** Bump the prebaked image to `temurin-21-jdk`, and bump the fallback path's `setup-java@v4` `java-version` to `21`.
|
||||
|
||||
**PRs:** [forgejo-stack#3](http://localhost:3000/forgejo_admin/forgejo-stack/pulls/3) (image), [exifcleaner-web#159](http://localhost:3000/forgejo_admin/exifcleaner-web/pulls/159) (workflow). Also note the sdkmanager license-accept needed `{ yes || true; }` to handle SIGPIPE under `pipefail` ([#2](http://localhost:3000/forgejo_admin/forgejo-stack/pulls/2)).
|
||||
|
||||
**Generalisable lesson:** check the `compileSdk` / `sourceCompatibility` in the SDK library's actual build.gradle before picking a JDK version — vendor docs lag.
|
||||
|
||||
---
|
||||
|
||||
### 3. `actions/upload-artifact@v4` rejects non-github.com origins
|
||||
|
||||
**Symptom (run #165):**
|
||||
|
||||
```
|
||||
::error::@actions/artifact v2.0.0+, upload-artifact@v4+ and download-artifact@v4+ are not currently supported on GHES.
|
||||
```
|
||||
|
||||
The Gradle build itself succeeded: `BUILD SUCCESSFUL in 5m 55s`, `app-debug.apk 4.7M, Zip archive`. Only the upload step failed.
|
||||
|
||||
**Diagnosis:** `@actions/artifact` v2 (used by `upload-artifact@v4`) detects non-GitHub.com hosts and throws `GHESNotSupportedError`. Forgejo gets caught in the GHES-detection net.
|
||||
|
||||
**Attempted fix (PR [#160](http://localhost:3000/forgejo_admin/exifcleaner-web/pulls/160)):** Switch to `https://code.forgejo.org/forgejo/upload-artifact@v4` — the Forgejo-maintained fork that "removes the GHES check".
|
||||
|
||||
---
|
||||
|
||||
### 4. Forgejo 10 only implements the v1 artifact API
|
||||
|
||||
**Symptom (run #168):**
|
||||
|
||||
The fork-`@v4` upload still failed:
|
||||
|
||||
```
|
||||
Error: connect ECONNREFUSED 127.0.0.1:3000
|
||||
::error::Finalize artifact upload failed: Artifact service responded with 500
|
||||
```
|
||||
|
||||
**Diagnosis:** Confirmed Forgejo version is `10.0.3+gitea-1.22.0`. Forgejo 10 only implements the **v1** artifact API. The `forgejo/upload-artifact@v4` fork patched out the GHES detection but still speaks v2's Twirp/blob-storage protocol underneath — and Forgejo 10 doesn't expose that endpoint. Forgejo 11 added v2 support.
|
||||
|
||||
**Fix:** Pin to `actions/upload-artifact@v3` (last version on the v1 protocol):
|
||||
|
||||
```yaml
|
||||
uses: https://github.com/actions/upload-artifact@v3
|
||||
```
|
||||
|
||||
**PR:** [exifcleaner-web#161](http://localhost:3000/forgejo_admin/exifcleaner-web/pulls/161).
|
||||
|
||||
**Generalisable lesson:** match upload-artifact major version to Forgejo's API version. v4 ↔ Forgejo 11+; v3 ↔ Forgejo 10. Bump both together.
|
||||
|
||||
---
|
||||
|
||||
### 5. ROOT_URL is unreachable from inside job containers
|
||||
|
||||
**Symptom (run #170, #172, #174):**
|
||||
|
||||
Even with `upload-artifact@v3` (which speaks the v1 API Forgejo 10 supports), upload failed:
|
||||
|
||||
```
|
||||
Retry limit reached for chunk at offset 0 to
|
||||
http://localhost:3000/api/actions_pipeline/_apis/pipelines/workflows/.../artifacts/.../upload
|
||||
Error: connect ECONNREFUSED 127.0.0.1:3000
|
||||
```
|
||||
|
||||
**Diagnosis (the deepest one):** Forgejo's actions service emits `ROOT_URL` (`http://localhost:3000/`) as the `ACTIONS_RUNTIME_URL` it gives to job containers. Inside a job container, `localhost:3000` resolves to the container itself — nothing listens, hence `ECONNREFUSED 127.0.0.1:3000`.
|
||||
|
||||
**Attempts that didn't work:**
|
||||
|
||||
- **`LOCAL_ROOT_URL=http://forgejo:3000/`** — Forgejo 10's actions code path uses `setting.AppURL` (= `ROOT_URL`), not `setting.LocalURL`. The env var got set, the app.ini got the entry, but the actions service ignored it. ([forgejo-stack#4](http://localhost:3000/forgejo_admin/forgejo-stack/pulls/4))
|
||||
- **`runner.envs.ACTIONS_RUNTIME_URL=http://forgejo:3000/`** in the act_runner config — act_runner's own auto-set of `ACTIONS_RUNTIME_URL` (derived from the URL Forgejo sends) overrides anything in `runner.envs`. ([forgejo-stack#5](http://localhost:3000/forgejo_admin/forgejo-stack/pulls/5))
|
||||
|
||||
**Fix:** Make `ROOT_URL` a hostname that resolves from both sides:
|
||||
|
||||
```ini
|
||||
FORGEJO_ROOT_URL=http://forgejo.localhost:3000/
|
||||
```
|
||||
|
||||
With a Docker network alias on the forgejo service:
|
||||
|
||||
```yaml
|
||||
forgejo:
|
||||
networks:
|
||||
internal:
|
||||
aliases:
|
||||
- forgejo.localhost
|
||||
```
|
||||
|
||||
- **Browser side:** per RFC 6761, `*.localhost` resolves to `127.0.0.1` in Chrome, Firefox, and Safari without any `/etc/hosts` entry. The existing `3000:3000` port mapping forwards to the forgejo container. Works.
|
||||
- **Job container side:** Docker DNS resolves the network alias to the forgejo container's internal IP. Job containers on `forgejo-stack_internal` can reach `http://forgejo.localhost:3000/`. Works.
|
||||
|
||||
One URL, two working resolution paths. Forgejo's emitted runtime URL is now reachable from job containers.
|
||||
|
||||
**PR:** [forgejo-stack#6](http://localhost:3000/forgejo_admin/forgejo-stack/pulls/6).
|
||||
|
||||
**Generalisable lesson:** when self-hosting Forgejo with act_runner in docker-compose, `ROOT_URL` must be resolvable from inside the job container's network namespace. Bare `localhost` won't work for artifact uploads or any in-cluster runner→server communication. `forgejo.localhost` + a network alias gives you a single hostname that works in both directions without `/etc/hosts` setup.
|
||||
|
||||
---
|
||||
|
||||
## Final state
|
||||
|
||||
### exifcleaner-web
|
||||
|
||||
| PR | Title | What it added |
|
||||
| --- | --- | --- |
|
||||
| [#156](http://localhost:3000/forgejo_admin/exifcleaner-web/pulls/156) | feat(android): Capacitor APK wrapper + on-demand CI build | Initial scaffolding |
|
||||
| [#158](http://localhost:3000/forgejo_admin/exifcleaner-web/pulls/158) | fix(ci-android): pin android-actions/setup-android to https://github.com/ | Action source resolution |
|
||||
| [#159](http://localhost:3000/forgejo_admin/exifcleaner-web/pulls/159) | fix(ci-android): bump fallback setup-java to JDK 21 | Workflow JDK 21 (fallback path) |
|
||||
| [#160](http://localhost:3000/forgejo_admin/exifcleaner-web/pulls/160) | fix(ci-android): use Forgejo fork of upload-artifact | Reverted by #161 |
|
||||
| [#161](http://localhost:3000/forgejo_admin/exifcleaner-web/pulls/161) | fix(ci-android): pin upload-artifact to v3 for Forgejo 10 v1-API | Final upload action |
|
||||
|
||||
### forgejo-stack
|
||||
|
||||
| PR | Title | What it added |
|
||||
| --- | --- | --- |
|
||||
| [#1](http://localhost:3000/forgejo_admin/forgejo-stack/pulls/1) | feat(runner): add Android-augmented job image | `runner-image-android/Dockerfile` |
|
||||
| [#2](http://localhost:3000/forgejo_admin/forgejo-stack/pulls/2) | fix(runner): swallow yes's SIGPIPE in android sdkmanager license step | Dockerfile build fix |
|
||||
| [#3](http://localhost:3000/forgejo_admin/forgejo-stack/pulls/3) | fix(runner-android): bump JDK 17 → 21 for Capacitor 7.6 | Prebaked image JDK |
|
||||
| [#4](http://localhost:3000/forgejo_admin/forgejo-stack/pulls/4) | fix(forgejo): set LOCAL_ROOT_URL | Didn't fix the issue (kept for future) |
|
||||
| [#5](http://localhost:3000/forgejo_admin/forgejo-stack/pulls/5) | fix(runner): inject ACTIONS_RUNTIME_URL via runner.envs | Didn't fix the issue (kept for future) |
|
||||
| [#6](http://localhost:3000/forgejo_admin/forgejo-stack/pulls/6) | fix(forgejo): use forgejo.localhost ROOT_URL | **The actual fix** |
|
||||
|
||||
`#4` and `#5` are kept because they're correct defense in depth even though they didn't help with the specific Forgejo 10 actions path — they may help on future Forgejo versions or for non-actions API calls.
|
||||
|
||||
### Files added/modified
|
||||
|
||||
**exifcleaner-web:**
|
||||
- `.github/workflows/build-android.yml` (3 explicit `https://github.com/...` URLs, JDK 21)
|
||||
- (plus everything in the original #156 scaffold)
|
||||
|
||||
**forgejo-stack:**
|
||||
- `docker-compose.yml`: `forgejo` service gets `forgejo.localhost` network alias; `forgejo` env adds `LOCAL_ROOT_URL`; runner heredoc generates `runner.envs.ACTIONS_RUNTIME_URL`
|
||||
- `.env.example`: `FORGEJO_ROOT_URL` default now `http://forgejo.localhost:3000/`, `FORGEJO_DOMAIN` default `forgejo.localhost`
|
||||
- `runner-image-android/Dockerfile`: JDK 21 (was 17)
|
||||
|
||||
### How to repeat for a new repo
|
||||
|
||||
1. **Image:** ensure `forgejo-stack/job-android:latest` exists on the runner host. If not, `cd forgejo-stack && ./setup.sh` (or `docker image rm` first, then `./setup.sh`, to force a rebuild after the Dockerfile changes).
|
||||
2. **forgejo-stack `.env`:** `FORGEJO_ROOT_URL=http://forgejo.localhost:3000/` (or run `setup.sh` after pulling latest master to inherit the new default).
|
||||
3. **Workflow file:** use explicit `https://github.com/...@version` for any non-`actions/*` action, and pin `actions/upload-artifact@v3` while on Forgejo 10.
|
||||
4. **Browser:** access Forgejo at `http://forgejo.localhost:3000/` (no /etc/hosts edit needed in any modern browser).
|
||||
|
||||
### When this changes
|
||||
|
||||
Bumping the stack to **Forgejo 11+** unlocks `actions/upload-artifact@v4`. Revisit issue 3/4 above when that happens — drop the v3 pin, drop the `forgejo/upload-artifact` fork reference, switch back to `actions/upload-artifact@v4` from the Forgejo mirror.
|
||||
|
||||
The `ROOT_URL` situation (issue 5) is independent of Forgejo version — it's a docker-networking fact. `forgejo.localhost` keeps working forever.
|
||||
Loading…
Add table
Reference in a new issue