feat(zip): generic ZIP support with recursive inner-file cleaning (#184) #188

Merged
forgejo_admin merged 24 commits from feat/issue-184-zip-support into master 2026-05-22 16:32:03 +00:00

Summary

Closes #184.

  • New ZipStrategy — walks each archive entry via JSZip, normalizes per-entry timestamps to DOS epoch (UTC noon 1980-01-01 — local-time construction would underflow under negative UTC offsets), scrubs archive + per-entry comments, recursively dispatches every supported inner file through selectStrategy() (including nested .zip → ZipStrategy itself via a module-level injected router slot that breaks the circular static import). Encrypted archives are refused with a structured invalid-file-format error directing users to a decryption-capable tool — deviation from the brainstormed pass-through direction documented in spec §3 and gap-analysis (JSZip refuses loadAsync on encrypted archives, blocking the partial passthrough at the library level).

  • New ZipExpansion UI tree — recursive component renders the archiveEntries from StripResult; each cleaned-leaf row lazy-loads its diff on first chevron click via a new window.api.wasm.buildArchiveLeafDiff method backed by WasmProcessor.pendingLeafDiffs + ExifToolDiffStrategy. Nested-zip rows recurse via the same component with depth+1. Pagination at 100 rows, indent caps at depth 5, render-time depth limit at 20 guards against adversarial archives. Extracted MetadataDiffTable + DiffSkeleton from MetadataDiffExpansion so leaves can reuse the two-pane table without inheriting the file-table expansion chrome.

  • Forensic verification (ExifTool + mat2 cross-tool)tools/forensic/zip.ts with adversarial-independent ZIP builder, 9-surface sentinel battery, three strip paths. All 8 strict surfaces pass for ZipStrategy. ExifTool refuses all 8 because it doesn't write generic ZIPs ("Writing of ZIP files is not yet supported" — the central finding of the gap analysis). mat2 matches us on all 8. Surface 9 (encrypted-entry inner content) is a documented KNOWN_GAP for both ZipStrategy and mat2 — parity.

Test plan

  • npx tsc --noEmit — clean
  • yarn lint — clean
  • yarn check:deps — no circular deps
  • yarn test524/524 passing (14 new ZipStrategy unit + 12 new ZipExpansion component tests; all existing tests unchanged)
  • yarn build:web — clean
  • yarn build:web:standalone — clean; standalone 33.8 MB (baseline unchanged; non-WASM bundle ~1 MB)
  • npx tsx tools/forensic/zip.ts — exit 0, no UNEXPECTED survivors across all 8 strict surfaces
  • yarn test:e2e:web — added two specs in tests/e2e/web/zip-archive.spec.ts; CI will exercise them

Forensic results matrix

# Surface Expected ZipStrategy exiftool mat2
1 Archive comment DROPPED DROPPED REFUSED¹ DROPPED
2 Per-entry comment DROPPED DROPPED REFUSED¹ DROPPED
3 Per-entry extra field (0x7878) DROPPED DROPPED REFUSED¹ DROPPED
4 Per-entry timestamp NORMALIZED → 1980-01-01 NORMALIZED REFUSED¹ NORMALIZED
5 Inner JPEG EXIF Artist DROPPED DROPPED REFUSED¹ DROPPED
6 Inner PDF /Author DROPPED DROPPED REFUSED¹ DROPPED
7 Inner DOCX dc:creator DROPPED DROPPED REFUSED¹ DROPPED
8 Nested-zip archive comment DROPPED DROPPED REFUSED¹ DROPPED
9 Encrypted entry inner content KNOWN_GAP KNOWN_GAP² REFUSED KNOWN_GAP³

¹ ExifTool documented limitation: "Writing of ZIP files is not yet supported".
² v1 refuses encrypted archives outright (invalid-file-format).
³ mat2 also refuses encrypted archives — parity.

Docs

  • Spec: docs/superpowers/specs/2026-05-22-issue-184-zip-support-design.md
  • Plan: docs/superpowers/plans/2026-05-22-issue-184-zip-rollout.md
  • Gap analysis: docs/gap-analysis/zip.md
  • Forensic writeup: docs/forensic/zip.md
  • Privacy gaps: new section in docs/PRIVACY_GAPS.md
  • README: Format Support Matrix row + footnote

Out of scope (closes .md half of #184)

.md is closed as out-of-scope: markdown has no embedded metadata in the EXIF/format-structure sense. The only conceivable target would be YAML/TOML frontmatter, which is content, not metadata; stripping it would change the file's meaning. There is nothing for a FormatStrategy to do. Confirmed with maintainer 2026-05-22.


Co-Authored-By: Claude Opus 4.7 (1M context)

## Summary Closes #184. - **New `ZipStrategy`** — walks each archive entry via JSZip, normalizes per-entry timestamps to DOS epoch (UTC noon 1980-01-01 — local-time construction would underflow under negative UTC offsets), scrubs archive + per-entry comments, recursively dispatches every supported inner file through `selectStrategy()` (including nested `.zip` → ZipStrategy itself via a module-level injected router slot that breaks the circular static import). Encrypted archives are refused with a structured `invalid-file-format` error directing users to a decryption-capable tool — deviation from the brainstormed pass-through direction documented in spec §3 and gap-analysis (JSZip refuses `loadAsync` on encrypted archives, blocking the partial passthrough at the library level). - **New `ZipExpansion` UI tree** — recursive component renders the `archiveEntries` from `StripResult`; each cleaned-leaf row lazy-loads its diff on first chevron click via a new `window.api.wasm.buildArchiveLeafDiff` method backed by `WasmProcessor.pendingLeafDiffs` + `ExifToolDiffStrategy`. Nested-zip rows recurse via the same component with `depth+1`. Pagination at 100 rows, indent caps at depth 5, render-time depth limit at 20 guards against adversarial archives. Extracted `MetadataDiffTable` + `DiffSkeleton` from `MetadataDiffExpansion` so leaves can reuse the two-pane table without inheriting the file-table expansion chrome. - **Forensic verification (ExifTool + mat2 cross-tool)** — `tools/forensic/zip.ts` with adversarial-independent ZIP builder, 9-surface sentinel battery, three strip paths. **All 8 strict surfaces pass for ZipStrategy.** ExifTool refuses all 8 because it doesn't write generic ZIPs ("Writing of ZIP files is not yet supported" — the central finding of the gap analysis). mat2 matches us on all 8. Surface 9 (encrypted-entry inner content) is a documented `KNOWN_GAP` for both ZipStrategy and mat2 — parity. ## Test plan - [x] `npx tsc --noEmit` — clean - [x] `yarn lint` — clean - [x] `yarn check:deps` — no circular deps - [x] `yarn test` — **524/524 passing** (14 new ZipStrategy unit + 12 new ZipExpansion component tests; all existing tests unchanged) - [x] `yarn build:web` — clean - [x] `yarn build:web:standalone` — clean; standalone 33.8 MB (baseline unchanged; non-WASM bundle ~1 MB) - [x] `npx tsx tools/forensic/zip.ts` — exit 0, no UNEXPECTED survivors across all 8 strict surfaces - [ ] `yarn test:e2e:web` — added two specs in `tests/e2e/web/zip-archive.spec.ts`; CI will exercise them ### Forensic results matrix | # | Surface | Expected | ZipStrategy | exiftool | mat2 | |---|---|---|---|---|---| | 1 | Archive comment | DROPPED | ✅ DROPPED | ❌ REFUSED¹ | ✅ DROPPED | | 2 | Per-entry comment | DROPPED | ✅ DROPPED | ❌ REFUSED¹ | ✅ DROPPED | | 3 | Per-entry extra field (0x7878) | DROPPED | ✅ DROPPED | ❌ REFUSED¹ | ✅ DROPPED | | 4 | Per-entry timestamp | NORMALIZED → 1980-01-01 | ✅ NORMALIZED | ❌ REFUSED¹ | ✅ NORMALIZED | | 5 | Inner JPEG EXIF Artist | DROPPED | ✅ DROPPED | ❌ REFUSED¹ | ✅ DROPPED | | 6 | Inner PDF /Author | DROPPED | ✅ DROPPED | ❌ REFUSED¹ | ✅ DROPPED | | 7 | Inner DOCX dc:creator | DROPPED | ✅ DROPPED | ❌ REFUSED¹ | ✅ DROPPED | | 8 | Nested-zip archive comment | DROPPED | ✅ DROPPED | ❌ REFUSED¹ | ✅ DROPPED | | 9 | Encrypted entry inner content | KNOWN_GAP | KNOWN_GAP² | REFUSED | KNOWN_GAP³ | ¹ ExifTool documented limitation: "Writing of ZIP files is not yet supported". ² v1 refuses encrypted archives outright (`invalid-file-format`). ³ mat2 also refuses encrypted archives — parity. ## Docs - Spec: `docs/superpowers/specs/2026-05-22-issue-184-zip-support-design.md` - Plan: `docs/superpowers/plans/2026-05-22-issue-184-zip-rollout.md` - Gap analysis: `docs/gap-analysis/zip.md` - Forensic writeup: `docs/forensic/zip.md` - Privacy gaps: new section in `docs/PRIVACY_GAPS.md` - README: Format Support Matrix row + footnote ## Out of scope (closes `.md` half of #184) `.md` is closed as out-of-scope: markdown has no embedded metadata in the EXIF/format-structure sense. The only conceivable target would be YAML/TOML frontmatter, which is content, not metadata; stripping it would change the file's meaning. There is nothing for a `FormatStrategy` to do. Confirmed with maintainer 2026-05-22. --- Co-Authored-By: Claude Opus 4.7 (1M context)
forgejo_admin added 10 commits 2026-05-22 10:34:16 +00:00
Design spec for ZipStrategy: walks each entry, recursively re-dispatches
file entries through selectStrategy() (nested .zip naturally recurses),
normalizes per-entry timestamps to epoch (privacy invariant §6), scrubs
archive + per-entry comments and extra fields. Encrypted entries pass
through unmodified with a per-file warning surfaced in the UI via a new
optional `warnings` field on StripResult. Forensic plan covers 9 sentinel
surfaces including a known-gap row for encrypted-entry inner content.
Closes the .md half of the issue as out-of-scope (no embedded metadata).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds option C — recursive ZipExpansion component with lazy per-leaf
diff loading — to the ZIP support design. Each cleaned leaf in the
tree triggers ExifToolDiffStrategy on first expand (matches the
existing top-level out-of-band diff pattern). Extracts MetadataDiffTable
from MetadataDiffExpansion for reuse in tree leaves. New StripResult
fields: warnings + archiveEntries (both optional; only ZipStrategy
populates them). New WasmProcessor.buildArchiveLeafDiff method exposed
on the window.api.wasm surface.

Corrects two architectural errors in the earlier draft: StripResult has
no metadataRemoved field (the FileRow status pill is binary
"Cleaned"/"Already clean"); diffDocument is populated by WasmProcessor
out-of-band, never by strategies. Per-leaf diffs follow the same
deferred-build pattern, just keyed by entryId+path instead of entryId.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
20 tasks: gap analysis, type extensions, ZipStrategy core (walk +
recursion + encryption detection), WasmProcessor extension for per-leaf
lazy diffs, MetadataDiffTable extraction, ZipExpansion tree component,
FileRow integration, i18n, forensic battery vs ExifTool + mat2,
e2e specs, quality gates, PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ZipStrategy walks each entry through JSZip, normalizes per-entry timestamps
to DOS epoch (UTC noon 1980-01-01 — local-construction would underflow
under negative UTC offsets, confirmed via JSZip round-trip POC), scrubs
archive and per-entry comments, and recurses into supported inner formats
via selectStrategy (including nested .zip → ZipStrategy via the injected
router slot that breaks the static circular import).

Encrypted archives deviate from the brainstormed pass-through direction:
JSZip refuses loadAsync on any archive with encrypted entries, blocking
the partial passthrough at the library level. Shipping policy is to
return invalid-file-format with a clear "use a decryption-capable tool"
detail. Deviation documented in spec §3 and gap-analysis encrypted-entry
row; tracked as a follow-up if the byte-level walker work is greenlit.

14 unit tests cover: magic-byte verification, archive comment scrub,
per-entry epoch + comment scrub, directory entries, inner JPEG recursion,
nested-zip recursion, encrypted-archive refusal, parse-failed on
truncated input, registry routing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WasmProcessor stashes per-leaf source + stripped bytes when a strategy
returns archiveEntries (currently only ZipStrategy). The new
buildArchiveLeafDiff method drains the stash and runs ExifToolDiffStrategy
on the leaf bytes, matching the existing top-level diff-build pattern.
Exposed via window.api.wasm.buildArchiveLeafDiff for the upcoming
ZipExpansion UI component.

Moves ArchiveEntryResult + ArchiveEntryStatus from infrastructure to
src/domain/files/archive_entry.ts so the application port can reference
them without crossing DDD layer boundaries. FileEntry state gains
optional warnings + archiveEntries fields; UPDATE_FILE_METADATA action
includes them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reproducible Phase-3 forensic runner at tools/forensic/zip.ts. Builds a
synthetic ZIP fixture via a self-contained byte-level builder
(independent of JSZip, per the adversarial-independence rule in
.claude/rules/format-strategy-workflow.md). Embeds 9 sentinels across
every metadata surface the gap analysis identified (archive comment,
per-entry comment, per-entry extra field, per-entry timestamp, inner
JPEG EXIF, inner PDF Info, inner DOCX dc:creator, nested-zip archive
comment, encrypted entry content).

Strips each fixture three ways — ZipStrategy in-process, exiftool
-all= -Time:All=, mat2 --inplace — then runs a recovery battery
(unzip -z / zipinfo -v / unzip -l / strings / inner exiftool + strings
per extracted entry). Exits non-zero on UNEXPECTED survivors for
ZipStrategy surfaces 1-8 (surface 9 is a documented KNOWN_GAP because
v1 refuses encrypted archives outright).

Reproduce: npx tsx tools/forensic/zip.ts
Outputs: /tmp/zip-forensic/report.json + per-fixture cleaned ZIPs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase-3 forensic writeup at docs/forensic/zip.md. Records the
sentinel-survival matrix across nine surfaces (archive comment,
per-entry comment, per-entry extra field, per-entry timestamp, inner
JPEG/PDF/DOCX, nested-zip archive comment, encrypted entry content)
for three strip paths: ZipStrategy, exiftool -all= -Time:All=, mat2.

Findings: ZipStrategy passes 8/8 strict surfaces (DROPPED or
NORMALIZED → 1980-01-01); equivalent to mat2 on this fixture.
ExifTool refuses to write generic ZIPs entirely ("Writing of ZIP
files is not yet supported"), so the comparison row records REFUSED
across the board — this is the central finding of the gap analysis,
surfaced directly in the matrix. Surface 9 (encrypted entry inner
content) is a documented KNOWN_GAP for both ZipStrategy and mat2
(both refuse encrypted archives).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(zip): forensic runner, PRIVACY_GAPS, README, e2e fixture + specs (#184)
Some checks failed
CI / Lint, Typecheck & Unit Tests (pull_request) Successful in 29s
CI / E2E (Standalone single-file) (pull_request) Successful in 1m22s
CI / E2E (Web) (pull_request) Failing after 5m5s
c66eff5e74
Catches up the docs + forensic + e2e pieces that the parallel agent
left behind:

- tools/forensic/zip.ts (1273 lines) — self-contained byte-level ZIP
  builder, 9-surface sentinel battery, ExifTool + mat2 cross-tool
  comparison. Local run: 8/8 strict surfaces pass for ZipStrategy.
- docs/PRIVACY_GAPS.md — new "ZIP archives" section documenting
  encrypted-archive refusal, SFX stub preservation, multi-disk refusal.
- README.md — Format Support Matrix row for ZIP with footnote linking
  to the gap-analysis + forensic docs.
- tests/e2e/fixtures/sample-zip.zip — synthetic fixture containing one
  EXIF-tagged JPEG (sentinel "E2E-SENTINEL-INNER-JPEG") + a directory
  entry. Archive comment carries "E2E-SENTINEL-ARCHIVE-COMMENT".
- tests/e2e/web/zip-archive.spec.ts — two Playwright specs covering the
  tree render (ZipExpansion vs MetadataDiffExpansion mutual exclusion,
  directory row non-expandable) and the lazy per-leaf diff load
  (skeleton → MetadataDiffTable on chevron click).

Quality gates: typecheck clean, lint clean, madge no cycles, 524/524
vitest, both web + standalone builds successful (standalone 33.8 MB
unchanged from baseline; non-WASM bundle ~1 MB).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
forgejo_admin added 1 commit 2026-05-22 10:53:13 +00:00
fix(zip): code review pass — nested-zip key collision, dead status, race, comments (#184)
All checks were successful
CI / Lint, Typecheck & Unit Tests (pull_request) Successful in 34s
CI / E2E (Standalone single-file) (pull_request) Successful in 1m21s
CI / E2E (Web) (pull_request) Successful in 3m30s
c74c106be5
Addresses ~10 findings from the multi-agent code review:

- **pendingLeafDiffs key collision (HIGH).** Nested ZIPs with same-named
  inner entries collided on the same key, causing the wrong diff to load
  on click. Fix: compose `entryId\0fullPath` where fullPath is built
  recursively in stashArchiveLeaves with each outer archive's path
  prepended. ZipExpansion now threads a `pathPrefix` through the
  recursive tree so the UI's buildArchiveLeafDiff call passes the same
  full path the processor stashed. NUL separator matches the established
  MetadataDiffTable.makeKey convention.
- **stashArchiveLeaves leaked nested-zip parents (MEDIUM-HIGH).** Every
  cleaned entry was stashed regardless of whether it was a leaf, but
  the UI's `isNestedZip` branch never calls buildArchiveLeafDiff. Fix:
  gate the stash with `entry.entries === null` so only actual leaves
  consume the per-leaf slot.
- **Top-level diff built but never rendered for ZIPs (MEDIUM).**
  WasmProcessor.process stashed the full archive bytes into pendingDiffs
  even though FileRow always routes ZIPs to ZipExpansion. Fix: when
  archiveEntries is populated, skip the per-entry stash entirely — only
  the per-leaf stashes carry useful diff inputs.
- **Race on rapid-click during diff load (MEDIUM).** Closing a leaf row
  while its buildArchiveLeafDiff was awaiting would re-open with stale
  content when the promise resolved. Fix: use a functional setState
  guard (`fromKind: "loading"`) that ignores stale writes.
- **State-key collision on duplicate filenames (LOW).** Two entries with
  the same path in the same archive (legal per ZIP spec) shared
  ZipExpansion's expand state. Fix: include the entry index in the
  state-map key.
- **Directory entry date fragility (MEDIUM-HIGH).** `outputZip.folder()`
  + post-mutation of `.date` had a null-lookup gap that could leak
  JSZip's default `new Date()` into the directory record (privacy
  invariant §6 violation). Fix: write directory entries via
  `outputZip.file(name, "", { dir: true, date, comment })` — JSZip
  takes the options directly, no post-mutation.
- **Dead `passed-through-encrypted` status variant (MEDIUM).** Declared
  in ArchiveEntryStatus + rendered in ZipExpansion + styled in CSS + i18n
  key, but no code path ever produces it (encrypted archives are refused
  outright in v1). Removed across all 5 surfaces; the corresponding test
  in zip_expansion.test.tsx becomes a comment explaining the deferral.
- **Stale `warnings` doc-comment (LOW).** Claimed "encrypted-entry
  pass-through messages" but no current strategy emits them. Updated to
  describe the field as reserved for the byte-level walker follow-up.
- **Stale FileRow header (LOW).** Listed two expansion branches when
  there are now four. Updated to enumerate ZipExpansion + Diff
  mutual-exclusion.
- **Nested-BEM CSS classes (LOW).** `.zip-expansion__row__chevron`/path/
  status flattened to `.zip-expansion__chevron`/`__path`/`__status` per
  CLAUDE.md "BEM (mandated for all new CSS)" — single-level elements
  match every other file in `src/web/styles/`.

Skipped (below threshold or design choice):
- findEncryptedEntries early-break on streaming descriptor — encrypted
  archives are refused at the loadAsync layer regardless, so the parse-
  failed fallback is acceptable.
- Inner-strategy refusal silently passing through unsupported — by
  design per gap-analysis §"Per-source policy table".
- MAX_RENDER_DEPTH visual at depth 20 — edge case, deferred.

Quality gates: typecheck clean, lint clean, madge no cycles, 523/523
vitest (1 removed test for the dead encrypted-status path).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Author
Owner

Self code review

Ran a multi-agent code review (5 parallel reviewers covering CLAUDE.md compliance, shallow bug scan, git history, prior PR comments, code-comment compliance). Surfaced 10 issues above the >35 threshold; all fixed in c74c106.

Critical

  1. pendingLeafDiffs key collision in nested ZIPs. Two nested zips with same-named leaves (e.g. a.zip/photo.jpg + b.zip/photo.jpg) keyed to the same stash slot, causing the wrong diff to load on click. Fix: NUL-separated ${entryId}\0${fullPath} where fullPath is composed recursively from the outer-archive prefix.

    Fixed at: wasm_processor.ts#L55-L84, ZipExpansion.tsx#L42-L57

  2. stashArchiveLeaves leaked nested-ZIP parent entries. Every cleaned entry was stashed including nested-zip parents that the UI never drains. Fix: gate stash with entry.entries === null.

    Fixed at: wasm_processor.ts#L300-L350

  3. Directory-entry date fallthrough (privacy invariant §6). outputZip.folder() + post-mutation could leak JSZip's default new Date() into the directory record. Fix: use file(name, "", { dir: true, date, comment }).

    Fixed at: zip_strategy.ts#L94-L118

Medium

  1. Top-level pendingDiffs stashed-then-wasted for ZIPs. ExifTool diff ran on full archive bytes even though FileRow always routes ZIPs to ZipExpansion. Fix: skip the per-entry stash when archiveEntries is populated.

    Fixed at: wasm_processor.ts#L148-L176

  2. Race on close-during-load. Clicking a leaf while its diff was loading and then again to close → resolve forced the row back open with stale content. Fix: functional setState guard fromKind: "loading".

    Fixed at: ZipExpansion.tsx#L75-L94

  3. Dead passed-through-encrypted status variant. Declared in ArchiveEntryStatus, rendered in UI, styled in CSS, with an i18n key — but no code path ever produces it (encrypted archives are refused outright in v1). Removed across all 5 surfaces.

    Fixed at: archive_entry.ts#L11-L18

Low (polish)

  1. State-key collision on duplicate filenames (legal per ZIP spec) — pair with entry index.
  2. Stale warnings doc-comment — said "encrypted-entry pass-through messages" but no current strategy emits them. Updated to "reserved for the byte-level walker follow-up".
  3. Stale FileRow.tsx header — listed two expansion branches; now lists all four (Error, NoMetadata, ZipExpansion, MetadataDiffExpansion) + mutual-exclusion note.
  4. Nested-BEM CSS classes (.zip-expansion__row__chevron etc.) — flattened to single-element form per CLAUDE.md "BEM (mandated for all new CSS)".

Skipped (below threshold or by-design)

  • findEncryptedEntries early-break on streaming-descriptor flag — encrypted archives are refused at the loadAsync layer either way, so the parse-failed fallback is acceptable.
  • Inner-strategy refusal silently passing through as passed-through-unsupported — by design per gap-analysis §"Per-source policy table" (extension-claimed-but-magic-mismatch is a user-input fact, not a privacy event).
  • MAX_RENDER_DEPTH visual at depth 20 — edge case (zip quine territory), deferred.

Quality gates: typecheck clean, lint clean, madge no cycles, 523/523 vitest (1 test removed for the dead encrypted-status path).

🤖 Generated with Claude Code

### Self code review Ran a multi-agent code review (5 parallel reviewers covering CLAUDE.md compliance, shallow bug scan, git history, prior PR comments, code-comment compliance). Surfaced 10 issues above the >35 threshold; all fixed in [c74c106](http://forgejo.localhost:3000/forgejo_admin/exifcleaner-web/commit/c74c106be56b45b2b68d11d4516bd4ce9999bfdc). **Critical** 1. **`pendingLeafDiffs` key collision in nested ZIPs.** Two nested zips with same-named leaves (e.g. `a.zip/photo.jpg` + `b.zip/photo.jpg`) keyed to the same stash slot, causing the wrong diff to load on click. Fix: NUL-separated `${entryId}\0${fullPath}` where `fullPath` is composed recursively from the outer-archive prefix. <br>Fixed at: [`wasm_processor.ts#L55-L84`](http://forgejo.localhost:3000/forgejo_admin/exifcleaner-web/src/commit/c74c106be56b45b2b68d11d4516bd4ce9999bfdc/src/infrastructure/wasm/wasm_processor.ts#L55-L84), [`ZipExpansion.tsx#L42-L57`](http://forgejo.localhost:3000/forgejo_admin/exifcleaner-web/src/commit/c74c106be56b45b2b68d11d4516bd4ce9999bfdc/src/web/components/file-list/ZipExpansion.tsx#L42-L57) 2. **`stashArchiveLeaves` leaked nested-ZIP parent entries.** Every cleaned entry was stashed including nested-zip parents that the UI never drains. Fix: gate stash with `entry.entries === null`. <br>Fixed at: [`wasm_processor.ts#L300-L350`](http://forgejo.localhost:3000/forgejo_admin/exifcleaner-web/src/commit/c74c106be56b45b2b68d11d4516bd4ce9999bfdc/src/infrastructure/wasm/wasm_processor.ts#L300-L350) 3. **Directory-entry date fallthrough (privacy invariant §6).** `outputZip.folder()` + post-mutation could leak JSZip's default `new Date()` into the directory record. Fix: use `file(name, "", { dir: true, date, comment })`. <br>Fixed at: [`zip_strategy.ts#L94-L118`](http://forgejo.localhost:3000/forgejo_admin/exifcleaner-web/src/commit/c74c106be56b45b2b68d11d4516bd4ce9999bfdc/src/infrastructure/wasm/strategies/zip_strategy.ts#L94-L118) **Medium** 4. **Top-level pendingDiffs stashed-then-wasted for ZIPs.** ExifTool diff ran on full archive bytes even though `FileRow` always routes ZIPs to `ZipExpansion`. Fix: skip the per-entry stash when `archiveEntries` is populated. <br>Fixed at: [`wasm_processor.ts#L148-L176`](http://forgejo.localhost:3000/forgejo_admin/exifcleaner-web/src/commit/c74c106be56b45b2b68d11d4516bd4ce9999bfdc/src/infrastructure/wasm/wasm_processor.ts#L148-L176) 5. **Race on close-during-load.** Clicking a leaf while its diff was loading and then again to close → resolve forced the row back open with stale content. Fix: functional setState guard `fromKind: "loading"`. <br>Fixed at: [`ZipExpansion.tsx#L75-L94`](http://forgejo.localhost:3000/forgejo_admin/exifcleaner-web/src/commit/c74c106be56b45b2b68d11d4516bd4ce9999bfdc/src/web/components/file-list/ZipExpansion.tsx#L75-L94) 6. **Dead `passed-through-encrypted` status variant.** Declared in `ArchiveEntryStatus`, rendered in UI, styled in CSS, with an i18n key — but no code path ever produces it (encrypted archives are refused outright in v1). Removed across all 5 surfaces. <br>Fixed at: [`archive_entry.ts#L11-L18`](http://forgejo.localhost:3000/forgejo_admin/exifcleaner-web/src/commit/c74c106be56b45b2b68d11d4516bd4ce9999bfdc/src/domain/files/archive_entry.ts#L11-L18) **Low (polish)** 7. **State-key collision on duplicate filenames** (legal per ZIP spec) — pair with entry index. 8. **Stale `warnings` doc-comment** — said "encrypted-entry pass-through messages" but no current strategy emits them. Updated to "reserved for the byte-level walker follow-up". 9. **Stale `FileRow.tsx` header** — listed two expansion branches; now lists all four (Error, NoMetadata, ZipExpansion, MetadataDiffExpansion) + mutual-exclusion note. 10. **Nested-BEM CSS classes** (`.zip-expansion__row__chevron` etc.) — flattened to single-element form per CLAUDE.md "BEM (mandated for all new CSS)". **Skipped (below threshold or by-design)** - `findEncryptedEntries` early-break on streaming-descriptor flag — encrypted archives are refused at the loadAsync layer either way, so the `parse-failed` fallback is acceptable. - Inner-strategy refusal silently passing through as `passed-through-unsupported` — by design per gap-analysis §"Per-source policy table" (extension-claimed-but-magic-mismatch is a user-input fact, not a privacy event). - `MAX_RENDER_DEPTH` visual at depth 20 — edge case (zip quine territory), deferred. Quality gates: typecheck clean, lint clean, madge no cycles, 523/523 vitest (1 test removed for the dead encrypted-status path). 🤖 Generated with [Claude Code](https://claude.com/claude-code)
forgejo_admin force-pushed feat/issue-184-zip-support from c74c106be5 to 7e144d86e2 2026-05-22 11:12:14 +00:00 Compare
Author
Owner

Rebased onto master + screenshot

Rebased the branch onto the latest master (which now includes #183 FfmpegFallbackStrategy). One conflict in src/infrastructure/wasm/wasm_processor.ts because master added a diffChain: Promise<unknown> field to serialize calls into @uswriting/exiftool's shared singleton state — my new buildArchiveLeafDiff had the same race, so I extended the chain to cover it too (per-leaf calls now serialize against top-level diffs and against each other).

Quality gates (post-rebase, with the new ffmpeg dep installed):

  • yarn typecheck — clean
  • yarn lint — clean
  • yarn check:deps — no circular deps
  • yarn test537/537 passing (up from 523 — master's FfmpegFallbackStrategy adds tests)
  • yarn build:web — clean
  • yarn build:web:standalone — clean; ~50 MB now (includes ffmpeg-core.wasm at 32 MB + zeroperl.wasm at 25 MB; my ZIP changes contribute ~5 KB)

Screenshot of the ZIP diff UI (against the standalone build via Playwright):

ZIP expansion tree with inner JPEG diff

The screenshot shows:

  • Top row: sample-zip.zip cleaned (303 B → 215 B)
  • Inner cleaned-leaf row: photo.jpg, expanded — MetadataDiffTable rendering inside zip-expansion__leaf-diff. The IFD0 · 1 REMOVED group header shows the EXIF Artist field's sentinel value (E2E-SENTINEL-INNER-JPEG) struck through in the Before column with a placeholder in the After column — confirming the recursive JPEG strip + the lazy-on-expand diff load
  • Directory row: notes/ rendered as non-expandable with the "Directory" status label

Toast at the bottom is the dispatchExifToolDiffLoading cue that fires on first WASM warm-up (the screenshot was captured during the warm-up window before the toast cleared).

🤖 Generated with Claude Code

### Rebased onto master + screenshot Rebased the branch onto the latest `master` (which now includes [#183 FfmpegFallbackStrategy](http://forgejo.localhost:3000/forgejo_admin/exifcleaner-web/pulls/183)). One conflict in `src/infrastructure/wasm/wasm_processor.ts` because master added a `diffChain: Promise<unknown>` field to serialize calls into `@uswriting/exiftool`'s shared singleton state — my new `buildArchiveLeafDiff` had the same race, so I extended the chain to cover it too (per-leaf calls now serialize against top-level diffs and against each other). **Quality gates (post-rebase, with the new ffmpeg dep installed):** - `yarn typecheck` — clean - `yarn lint` — clean - `yarn check:deps` — no circular deps - `yarn test` — **537/537** passing (up from 523 — master's FfmpegFallbackStrategy adds tests) - `yarn build:web` — clean - `yarn build:web:standalone` — clean; ~50 MB now (includes ffmpeg-core.wasm at 32 MB + zeroperl.wasm at 25 MB; my ZIP changes contribute ~5 KB) **Screenshot of the ZIP diff UI** (against the standalone build via Playwright): ![ZIP expansion tree with inner JPEG diff](http://forgejo.localhost:3000/attachments/2e0bb609-d8d4-4dff-b22e-8d45b3b8b088) The screenshot shows: - Top row: `sample-zip.zip` cleaned (303 B → 215 B) - Inner cleaned-leaf row: `photo.jpg`, expanded — `MetadataDiffTable` rendering inside `zip-expansion__leaf-diff`. The `IFD0 · 1 REMOVED` group header shows the EXIF `Artist` field's sentinel value (`E2E-SENTINEL-INNER-JPEG`) struck through in the Before column with a placeholder in the After column — confirming the recursive JPEG strip + the lazy-on-expand diff load - Directory row: `notes/` rendered as non-expandable with the "Directory" status label Toast at the bottom is the `dispatchExifToolDiffLoading` cue that fires on first WASM warm-up (the screenshot was captured during the warm-up window before the toast cleared). 🤖 Generated with [Claude Code](https://claude.ai/code)
forgejo_admin force-pushed feat/issue-184-zip-support from 7e144d86e2 to 45be688702 2026-05-22 11:17:01 +00:00 Compare
forgejo_admin added 1 commit 2026-05-22 11:18:41 +00:00
i18n(zip): add Arabic + Spanish translations for ZipExpansion strings (#184)
All checks were successful
CI / Lint, Typecheck & Unit Tests (pull_request) Successful in 35s
CI / Smoke build (VITE_ENABLE_FFMPEG_FALLBACK=false) (pull_request) Successful in 50s
CI / E2E (Standalone single-file) (pull_request) Successful in 1m59s
CI / E2E (Web) (pull_request) Successful in 3m44s
838d9d010e
forgejo_admin added 1 commit 2026-05-22 11:20:14 +00:00
fix(ui): apply file-table__diff--two-pane to ZipExpansion leaf diff (#184)
All checks were successful
CI / Lint, Typecheck & Unit Tests (pull_request) Successful in 40s
CI / Smoke build (VITE_ENABLE_FFMPEG_FALLBACK=false) (pull_request) Successful in 51s
CI / E2E (Standalone single-file) (pull_request) Successful in 1m36s
CI / E2E (Web) (pull_request) Successful in 4m41s
1c94c8d314
Without the parent class, the grid layout scoped to
.file-table__diff--two-pane .file-table__diff-pane-header didn't
apply and BEFORE/AFTER labels collapsed into 'BeforeAfter' next to
each other. Fix: add the two-pane class to the leaf wrapper alongside
zip-expansion__leaf-diff so the grid columns + spacing cascade
correctly.
Author
Owner

Follow-up

Two follow-ups on the rebase:

  1. i18n gap. The 8 new zipExpansion.* keys shipped with en only, which would fall back to English for users on the Arabic-default locale. Added Spanish + Arabic translations to match the existing pattern.

  2. Screenshot showed BeforeAfter collapsed. The grid layout for the two-pane header is scoped to .file-table__diff--two-pane in file_table.css; my leaf-diff wrapper only had zip-expansion__leaf-diff, so the labels stacked next to each other without gap or columns. Fix: include file-table__diff--two-pane on the leaf wrapper so the cascade hits the existing rule. New screenshot below.

ZIP expansion tree with inner JPEG diff — BEFORE/AFTER as separate columns

🤖 Generated with Claude Code

### Follow-up Two follow-ups on the rebase: 1. **i18n gap.** The 8 new `zipExpansion.*` keys shipped with `en` only, which would fall back to English for users on the Arabic-default locale. Added Spanish + Arabic translations to match the existing pattern. 2. **Screenshot showed `BeforeAfter` collapsed.** The grid layout for the two-pane header is scoped to `.file-table__diff--two-pane` in `file_table.css`; my leaf-diff wrapper only had `zip-expansion__leaf-diff`, so the labels stacked next to each other without gap or columns. Fix: include `file-table__diff--two-pane` on the leaf wrapper so the cascade hits the existing rule. New screenshot below. ![ZIP expansion tree with inner JPEG diff — BEFORE/AFTER as separate columns](http://forgejo.localhost:3000/attachments/cc6cf70f-a36b-4320-93f4-68a30a7932f2) 🤖 Generated with [Claude Code](https://claude.ai/code)
forgejo_admin added 1 commit 2026-05-22 11:32:40 +00:00
test(forensic): real-world extra-field profile — UT/UID/GID/NTFS sub-surfaces (#184)
All checks were successful
CI / Lint, Typecheck & Unit Tests (pull_request) Successful in 31s
CI / Smoke build (VITE_ENABLE_FFMPEG_FALLBACK=false) (pull_request) Successful in 44s
CI / E2E (Standalone single-file) (pull_request) Successful in 1m35s
CI / E2E (Web) (pull_request) Successful in 3m20s
13b3327a64
Surface 3 (per-entry extra field) was previously a single sentinel
embedded in a custom 0x7878 record. Real-world ZIPs from Word's
"Save as zip", macOS Finder, 7-Zip, and Info-ZIP commonly carry
several record IDs alongside file data:

- UT extended timestamp (0x5455) — wall-clock mtime/atime/ctime
- UID/GID Unix v1 (0x7875) — creator's user/group IDs
- NTFS times (0x000a) — 100-ns high-resolution Windows timestamps
- custom 0x7878 — original sub-surface

ZipStrategy's policy ("strip every extra-field record except Zip64
0x0001") is now verified per-record-id: 3a/3b/3c/3d each carry a
distinct ASCII sentinel embedded in their data payload, and the
recovery battery confirms each is independently stripped.

BuilderEntry shape changes from a single optional extraFieldPayload
to an array of {headerId, data} records. The builder concatenates
records in LFH/CD order. All four record IDs are added to the
notes.txt entry via the new buildRealWorldExtraFields helper.

Runner output: 11/11 strict surfaces DROPPED-or-NORMALIZED on
ZipStrategy. ExifTool refuses all 11 (writes generic ZIPs not
supported). mat2 matches at 11/11. Surface 9 (encrypted inner
content) remains KNOWN_GAP — parity with mat2.

Drops the "extending the fixture is a worthwhile follow-up" caveat
from docs/forensic/zip.md and replaces it with the new per-record
sub-surface explanation + the documented Zip64 deferral (Zip64 is
only triggered by >4 GB archives; a multi-GB fixture is out of
scope).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Author
Owner

Forensic extra-field coverage extended

You called out the "real-world extra-field profiles ... worthwhile follow-up" caveat — done. Surface 3 was a single custom 0x7878 record; now four sub-surfaces, one per record ID common in real-world archives:

  • 3a. custom 0x7878SENTINEL-EXTRA-7G8H9I (existing, exercises the "unknown record = strip anyway" path)
  • 3b. UT extended timestamp (0x5455)SENTINEL-EXTRA-UT-K2L3M4 (the wall-clock mtime/atime/ctime trio Linux zip / Finder / Word's "Save as zip" write)
  • 3c. UID/GID Unix v1 (0x7875)SENTINEL-EXTRA-UIDGID-N5O6P7 (creator's uid/gid; identifies the user account that built the archive)
  • 3d. NTFS times (0x000a)SENTINEL-EXTRA-NTFS-Q8R9S0 (Windows high-resolution timestamps)

Each record carries its own distinct ASCII sentinel in its data payload; the recovery battery confirms per-record-id stripping. All four DROPPED for ZipStrategy. mat2 matches at 4/4. ExifTool refuses all 4 (consistent with the documented "writes generic ZIPs not supported" finding).

Final runner output:

PASS — all 11 (DROPPED/NORMALIZED) strict surfaces verified for ZipStrategy.

Zip64 (0x0001) — the structural exception we preserve — is not sentinel-tested because Zip64 is only triggered by >4 GB archives or entries, and a multi-GB fixture is out of scope. Documented in the writeup's "Caveats and limits" section.

🤖 Generated with Claude Code

### Forensic extra-field coverage extended You called out the "real-world extra-field profiles ... worthwhile follow-up" caveat — done. Surface 3 was a single custom 0x7878 record; now four sub-surfaces, one per record ID common in real-world archives: - **3a. custom 0x7878** — `SENTINEL-EXTRA-7G8H9I` (existing, exercises the "unknown record = strip anyway" path) - **3b. UT extended timestamp (0x5455)** — `SENTINEL-EXTRA-UT-K2L3M4` (the wall-clock mtime/atime/ctime trio Linux `zip` / Finder / Word's "Save as zip" write) - **3c. UID/GID Unix v1 (0x7875)** — `SENTINEL-EXTRA-UIDGID-N5O6P7` (creator's uid/gid; identifies the user account that built the archive) - **3d. NTFS times (0x000a)** — `SENTINEL-EXTRA-NTFS-Q8R9S0` (Windows high-resolution timestamps) Each record carries its own distinct ASCII sentinel in its `data` payload; the recovery battery confirms per-record-id stripping. All four DROPPED for ZipStrategy. mat2 matches at 4/4. ExifTool refuses all 4 (consistent with the documented "writes generic ZIPs not supported" finding). **Final runner output:** ``` PASS — all 11 (DROPPED/NORMALIZED) strict surfaces verified for ZipStrategy. ``` Zip64 (0x0001) — the *structural* exception we preserve — is not sentinel-tested because Zip64 is only triggered by >4 GB archives or entries, and a multi-GB fixture is out of scope. Documented in the writeup's "Caveats and limits" section. 🤖 Generated with [Claude Code](https://claude.ai/code)
forgejo_admin added 1 commit 2026-05-22 11:42:06 +00:00
fix(zip): output size inflation + wrong 'Already clean' status (#184)
All checks were successful
CI / Lint, Typecheck & Unit Tests (pull_request) Successful in 30s
CI / Smoke build (VITE_ENABLE_FFMPEG_FALLBACK=false) (pull_request) Successful in 46s
CI / E2E (Standalone single-file) (pull_request) Successful in 2m35s
CI / E2E (Web) (pull_request) Successful in 3m17s
594668e5b2
Three bugs hit together when dropping a ZIP containing binary-compressed
files (JPEG, PNG):

1. generateAsync without compression: "DEFLATE" defaulted to STORE, so
   every DEFLATE-compressed entry in the source became uncompressed in the
   output — output grew instead of shrinking.

2. processViaWasm used outputBytes < entry.size as the sole "cleaned"
   signal. Since output was larger (bug 1), and even with DEFLATE a ZIP of
   JPEGs won't shrink (JPEG data is already compressed), the file was tagged
   NoMetadataFound and the row showed "Already clean".

3. ZipExpansion only rendered when status === Complete. With NoMetadataFound
   the row fell through to the generic "No metadata found" notice and the
   archive tree never appeared, so users saw no diff at all.

Fixes:
- zip_strategy: generateAsync now passes compression: "DEFLATE".
- use_process_files: hasAnyCleanedEntry() walks the archiveEntries tree;
  a ZIP is Complete if any inner entry was cleaned, independent of byte
  size.
- FileRow: ZipExpansion renders whenever isComplete && hasArchiveEntries
  (covers both Complete and NoMetadataFound), so even all-unsupported ZIPs
  show the entry tree. The generic "no metadata" notice only shows for
  non-archive NoMetadataFound rows.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
forgejo_admin added 1 commit 2026-05-22 12:06:23 +00:00
fix(zip): show ExifTool output for clean inner files; clarify empty-diff message (#184)
All checks were successful
CI / Lint, Typecheck & Unit Tests (pull_request) Successful in 33s
CI / Smoke build (VITE_ENABLE_FFMPEG_FALLBACK=false) (pull_request) Successful in 58s
CI / E2E (Standalone single-file) (pull_request) Successful in 1m57s
CI / E2E (Web) (pull_request) Successful in 3m46s
8dcfcee596
When a leaf inside a ZIP has no removable metadata (already clean source
bytes), ExifTool's diff returns before=[] after=[]. Previously we showed
'No metadata detected' which is ambiguous. Now:
- before.length > 0 || after.length > 0: renders the diff table (covers
  files where after has system fields ExifTool always reports)
- both empty: shows 'No metadata — file appears already clean' (new
  zipExpansion.alreadyClean i18n key with en/es/ar), disambiguating
  from the diffFailed case

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Author
Owner

Updated ZIP diff — inner-leaf metadata diff

Fixes the empty-diff condition (#184 follow-up):

  • Before: showed "No metadata detected" whenever , even if had entries
  • After: shows the diff table when ; shows "No metadata — file appears already clean" (new i18n key with en/es/ar) only when both arrays are empty

zip-diff

### Updated ZIP diff — inner-leaf metadata diff Fixes the empty-diff condition (#184 follow-up): - **Before**: showed "No metadata detected" whenever , even if had entries - **After**: shows the diff table when ; shows "No metadata — file appears already clean" (new i18n key with en/es/ar) only when both arrays are empty ![zip-diff](http://forgejo.localhost:3000/attachments/beb1ab95-a441-4ef4-9277-0b34b06497ab)
forgejo_admin added 1 commit 2026-05-22 12:21:01 +00:00
fix(zip): harden ZipStrategy — depth limit, size cap, permissions, encryption detection (#184)
All checks were successful
CI / Lint, Typecheck & Unit Tests (pull_request) Successful in 34s
CI / Smoke build (VITE_ENABLE_FFMPEG_FALLBACK=false) (pull_request) Successful in 54s
CI / E2E (Standalone single-file) (pull_request) Successful in 1m46s
CI / E2E (Web) (pull_request) Successful in 3m43s
ec9648c19a
Seven issues from code review addressed:

1. Recursion depth limit: stripAtDepth() private method threads a depth
   counter; returns invalid-file-format at MAX_NESTING_DEPTH (10), which
   the outer strip surfaces as a warning-and-passthrough so the rest of
   the archive is still processed.

2. Decompressed size cap: running total across all entries; errors at
   MAX_DECOMPRESSED_BYTES (2 GB) to prevent ZIP-bomb DoS.

3. External attribute normalisation: unixPermissions (0o644/0o755) and
   dosPermissions (0) written explicitly on every entry. Without this,
   source archives could leak the producer's umask and DOS attribute
   flags (archive/hidden/system bits) into the output.

4. Encryption detection: replaced the hand-rolled LFH scanner with a
   try/catch on JSZip.loadAsync(). The scanner had two blind spots —
   ZIP64 entries (0xFFFFFFFF compressed size in LFH) desynchronised the
   stride math, and data-descriptor entries (GP bit 3) broke the scan on
   the first streaming entry. JSZip's detection is authoritative.

5. Asymmetric inner-failure warning: nested ZIPs that fail to strip
   (encryption, depth limit, parse error) now emit a prefixed warning
   instead of silently passing through, so the user knows the content
   escaped the scrub.

6. Compression documented: the ZIP_EPOCH comment block now calls out
   unixPermissions, dosPermissions, and DEFLATE compression as a unified
   "canonical output" policy, not separate incidental choices.

7. Router-injection warning: console.warn when injectedRouter is null so
   test mis-setup is caught faster.

Tests added: nested-encrypted warning, depth-limit propagation,
dosPermissions normalisation (file + directory).

Note: unixPermissions round-trip isn't verifiable through JSZip.loadAsync
without platform:3 in version-made-by. Tests cover dosPermissions which
does round-trip; unixPermissions are written correctly per the strategy
code even though the test can't assert them via JSZip.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
forgejo_admin added 1 commit 2026-05-22 12:23:07 +00:00
fix(zip): re-open leaf shows stale doc; top-level already-clean message (#184)
All checks were successful
CI / Lint, Typecheck & Unit Tests (pull_request) Successful in 35s
CI / Smoke build (VITE_ENABLE_FFMPEG_FALLBACK=false) (pull_request) Successful in 49s
CI / E2E (Standalone single-file) (pull_request) Successful in 1m41s
CI / E2E (Web) (pull_request) Successful in 3m22s
06fa769617
Two bugs:

1. ZipExpansion: clicking a leaf, closing, then reopening showed "No
   metadata" on the second open. The WasmProcessor stash is drained on
   first buildArchiveLeafDiff call; a second call finds no stash and
   returns null. Fix: carry the loaded doc in the "closed" LeafState so
   reopening skips the API call entirely.

2. MetadataDiffExpansion (top-level files): when ExifTool returns an
   empty diff for an already-clean file the expansion area rendered
   nothing (null), indistinguishable from "diff failed". Now renders the
   same "No metadata — file appears already clean" message used in
   ZipExpansion, using the existing zipExpansion.alreadyClean i18n key.
   Condition expanded to before.length > 0 || after.length > 0 (mirrors
   the ZipExpansion change) to show the table when after has entries.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
forgejo_admin added 1 commit 2026-05-22 12:41:38 +00:00
fix(diff): show ExifTool diff for 'Already clean' files with preserved metadata (#184)
All checks were successful
CI / Lint, Typecheck & Unit Tests (pull_request) Successful in 28s
CI / Smoke build (VITE_ENABLE_FFMPEG_FALLBACK=false) (pull_request) Successful in 39s
CI / E2E (Standalone single-file) (pull_request) Successful in 1m32s
CI / E2E (Web) (pull_request) Successful in 3m19s
7b8d81a355
When a file is processed with preserveOrientation/preserveColorProfile
enabled, output ≈ input (nothing removed → NoMetadataFound/'Already
clean'). The async ExifTool diff WAS built (showing what was preserved),
but FileRow never rendered it because the NoMetadataFound expansion path
showed a static text div instead of MetadataDiffExpansion.

Now: if diffDocument is available or diffPending, route to
MetadataDiffExpansion instead of the generic text, so users can see the
orientation tag, ICC profile fields, etc. that were intentionally kept.
Falls back to the generic text when ExifTool diff is disabled or hasn't
arrived yet.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
forgejo_admin added 1 commit 2026-05-22 14:37:08 +00:00
fix(zip): 'Already clean' status for inner entries + cache leaf diffs across re-opens (#184)
All checks were successful
CI / Lint, Typecheck & Unit Tests (pull_request) Successful in 35s
CI / Smoke build (VITE_ENABLE_FFMPEG_FALLBACK=false) (pull_request) Successful in 49s
CI / E2E (Standalone single-file) (pull_request) Successful in 1m39s
CI / E2E (Web) (pull_request) Successful in 3m31s
5bde4fed86
Two bugs:

1. Inner ZIP entries always showed 'Cleaned' even when nothing was
   removable, while the top-level UI distinguishes 'Cleaned' (bytes
   shrunk) from 'Already clean' (NoMetadataFound). Tree didn't match.
   Added 'already-clean' to ArchiveEntryStatus; ZipStrategy now picks
   between 'cleaned' and 'already-clean' based on the same byte-size
   comparison the top-level uses. Nested ZIPs are promoted to 'cleaned'
   if any deep entry was actually cleaned. Both statuses keep the bytes
   stash so the user can still expand and see the diff.

2. Collapsing the outer ZIP row and reopening it lost previously-loaded
   leaf diffs: the ZipExpansion component unmounts on collapse (losing
   its LeafState.cachedDoc), and the WasmProcessor stash was drained
   on the first call so the second buildArchiveLeafDiff returned null
   and the row rendered 'No metadata'. Fix: cache the computed
   MetadataDocument in WasmProcessor (cachedLeafDocs) after the first
   compute so subsequent calls — including after a full unmount —
   return the cached doc. Doc is small text data; the heavy bytes stash
   is still drained on first call.

Added i18n key zipExpansion.statusAlreadyClean (en/es/ar) and a unit
test verifying an already-clean inner JPEG gets the new status while
keeping the bytes stash.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
forgejo_admin added 1 commit 2026-05-22 14:48:43 +00:00
docs+test: update gap-analysis encryption section, add FileRow NoMetadataFound routing tests (#184)
All checks were successful
CI / Lint, Typecheck & Unit Tests (pull_request) Successful in 32s
CI / Smoke build (VITE_ENABLE_FFMPEG_FALLBACK=false) (pull_request) Successful in 1m6s
CI / E2E (Standalone single-file) (pull_request) Successful in 2m17s
CI / E2E (Web) (pull_request) Successful in 3m47s
5f17c9af75
- docs/gap-analysis/zip.md: replace the outdated 'requires reading the
  LFH GP-flag bit 0 directly' line with the current approach (JSZip's
  loadAsync throw), explaining why we switched (ZIP64 + data-descriptor
  blind spots in the hand-rolled scanner).

- tests/web/components/file_row.test.tsx: 4 static-render tests covering
  FileRow's NoMetadataFound expansion routing — generic text when no
  diff, skeleton when diffPending, diff table when diffDocument has
  content, alreadyClean message when diff has empty before+after. The
  routing change for these cases shipped in 7b8d81a but lacked unit
  coverage.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
forgejo_admin added 1 commit 2026-05-22 16:02:45 +00:00
fix(zip): address all 15 findings from final pre-merge review (#184)
All checks were successful
CI / Lint, Typecheck & Unit Tests (pull_request) Successful in 34s
CI / Smoke build (VITE_ENABLE_FFMPEG_FALLBACK=false) (pull_request) Successful in 45s
CI / E2E (Standalone single-file) (pull_request) Successful in 1m43s
CI / E2E (Web) (pull_request) Successful in 3m41s
0bab33c410
Privacy + security (must-fix):
- #1 ZipStrategy now pre-emits parent-folder entries with ZIP_EPOCH
  before adding any file, so JSZip's auto-create path can't leak
  `new Date()` into the central directory for paths whose parents
  aren't explicit dir entries (Info-ZIP `zip` default behaviour).
  Privacy invariant §6 violation otherwise. Test added.
- #2 WasmProcessor.clearLeafCacheForEntry({entryId}) added + wired to
  CLEAR_FILES in App.tsx, so parsed leaf MetadataDocuments don't
  linger after the user removes files. Privacy invariant §3.
- #3 MAX_DECOMPRESSED_BYTES is now a shared byteBudget object threaded
  through every stripAtDepth recursion, bounding total decompression
  to 2 GB across all levels instead of 2 GB × MAX_NESTING_DEPTH.
- #4 Pre-check JSZipObject._data.uncompressedSize against the budget
  BEFORE entry.async() decompresses; aborts ZIP bombs before the big
  Uint8Array allocation. Post-check stays as defense if CDH lies.
- #5 New archiveHasEncryptedEntries(): walks the central directory
  pre-loadAsync detecting GP-flag bit 0 (ZipCrypto), bit 6 (strong/
  AES), and compression method 99 (WinZip AE-1/AE-2). CDH-based scan
  doesn't share the LFH-scanner's ZIP64 / data-descriptor blind spots.
- #12 The loadAsync catch now pins to JSZip's exact known throw string
  ("Encrypted zip are not supported") instead of /encrypted/i regex —
  the CDH scanner is now the primary detection so this is a
  belt-and-braces fallback bounded to one specific JSZip version.

Correctness (functional bugs):
- #6 cachedLeafDocs now stores Promises (not resolved values).
  Concurrent buildArchiveLeafDiff calls for the same key join the
  same await, eliminating the drain-pending-before-cache-write race
  that left rows permanently stuck on "No metadata" after a
  close-during-load + reopen cycle.
- #7 New runLeafDiff returns discriminated LeafDiffResult
  ({kind: "ok"; doc} | {kind: "failed"}). UI's renderLeafBody
  distinguishes "Diff failed" from "Already clean" instead of
  rendering both as "No metadata". The failed state is now reachable.
- #8 Non-ZIP inner-entry status now considers walkerEntries.length >
  0 in addition to byte-shrink — Office/PDF strategies that report
  what they removed get the right "cleaned" label even when the byte
  count didn't change. Video in-place strips remain a documented
  partial since they don't emit walker entries.
- #9 Nested-ZIP status NO LONGER uses byte-shrink as a signal — only
  hasAnyCleanedInTree drives "cleaned" vs "already-clean". Permission
  normalization alone (0o755→0o644, DOS attr 0x22→0x00) can shrink
  the central directory, which was triggering false-positive
  "cleaned" labels on nested archives where no metadata was actually
  removed.
- #10 diffHasChanges now uses a multiset (count of source+name+value
  tuples) instead of Map.set which collapses duplicates. Before
  fix: before=[(A,X),(A,X)] vs after=[(A,X),(B,Y)] reported "no
  changes" because the duplicate match consumed both before entries
  while the (B,Y) addition was never inspected.
- #11 sourceBytes/strippedBytes are no longer attached to nested-ZIP
  archiveEntries — they were never consumed (stashArchiveLeaves
  skips non-leaves) and held in AppContext for the FileEntry's
  lifetime. For nested ZIPs the bytes can be hundreds of MB per
  level.

Performance + maintainability:
- #13 use_process_files.ts no longer enqueues archives onto diffEntries
  — ZIPs always returned null from buildDiffDocumentForEntry, so
  the queued IPC call + UPDATE_FILE_DIFF dispatch were pure waste.
  100 ZIPs in a batch = 100 saved round-trips.
- #14 ZipExpansion handlers (setLeafState, setLeafStateIfStill) are
  now useCallback-stable instead of fresh closures per parent render.
  Critical for ZIPs with hundreds of leaves where any state change
  was re-rendering every row.
- #15 renderStatus in ZipExpansion.tsx adds assertNever default —
  future ArchiveEntryStatus variants get a compile-time exhaustiveness
  error instead of silently rendering undefined.

API surface changes:
- LeafDiffResult discriminated union added to application port; re-
  exported from `application/index.ts` and consumed by WebApi.
- MetadataProcessorPort gained clearLeafCacheForEntry({entryId}).
- buildArchiveLeafDiff now returns Promise<LeafDiffResult> instead
  of Promise<MetadataDocument | null>.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
forgejo_admin added 1 commit 2026-05-22 16:17:05 +00:00
fix(zip): inner-entry status uses byte-content equality, not just length (#184)
All checks were successful
CI / Lint, Typecheck & Unit Tests (pull_request) Successful in 31s
CI / Smoke build (VITE_ENABLE_FFMPEG_FALLBACK=false) (pull_request) Successful in 45s
CI / E2E (Standalone single-file) (pull_request) Successful in 1m37s
CI / E2E (Web) (pull_request) Successful in 3m24s
3d5e38a4b8
Bug: a DOCX/XLSX/PDF/MP4 inside a ZIP whose only metadata change was
container-level (ZIP CDH timestamps normalized, PDF re-encoded, MP4 re-
muxed) rendered as "Already clean" on the row label even though the diff
clearly showed the cleaning. The byte-length comparison was too strict:
those strategies produce same-length output with different content.

Fix: compare strategy output bytes to input bytes byte-by-byte (after
first checking length, which short-circuits on the common shrink case).
If anything changed at all — length or content — the entry is "cleaned".
Only truly-byte-identical passthroughs (JPEG/PNG with no removable
metadata, where the strategy walks markers/chunks and emits its input
verbatim) get "already-clean".

The comparison uses DataView.getUint32 for ~4x speed on larger entries;
worst case for a ZIP of all-clean small files is sub-second overhead.

Regression test: a synthetic DOCX inside a ZIP whose only change is the
inner-CDH timestamp normalisation now correctly tags as 'cleaned'.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
forgejo_admin added 1 commit 2026-05-22 16:27:36 +00:00
docs(zip): update ArchiveEntryStatus comments to reflect byte-content semantics (#184)
All checks were successful
CI / Lint, Typecheck & Unit Tests (pull_request) Successful in 32s
CI / Smoke build (VITE_ENABLE_FFMPEG_FALLBACK=false) (pull_request) Successful in 44s
CI / E2E (Standalone single-file) (pull_request) Successful in 1m35s
CI / E2E (Web) (pull_request) Successful in 3m25s
a1679e4d65
The "cleaned" / "already-clean" doc strings still described the old
length-only heuristic. Update them to match the current byte-content
equality logic in zip_strategy.ts so future readers don't get misled.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
forgejo_admin merged commit d9e763b76e into master 2026-05-22 16:32:03 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: forgejo_admin/exifcleaner-web#188
No description provided.