This commit is contained in:
parent
a5546afa71
commit
d9e763b76e
35 changed files with 6051 additions and 327 deletions
|
|
@ -1493,5 +1493,50 @@
|
|||
"en": "Loading metadata reader…",
|
||||
"es": "Cargando lector de metadatos…",
|
||||
"ar": "جارٍ تحميل قارئ البيانات الوصفية…"
|
||||
},
|
||||
"zipExpansion.statusCleaned": {
|
||||
"en": "Cleaned",
|
||||
"es": "Limpiado",
|
||||
"ar": "تم التنظيف"
|
||||
},
|
||||
"zipExpansion.statusAlreadyClean": {
|
||||
"en": "Already clean",
|
||||
"es": "Ya limpio",
|
||||
"ar": "نظيف بالفعل"
|
||||
},
|
||||
"zipExpansion.statusUnsupported": {
|
||||
"en": "Unsupported — passed through",
|
||||
"es": "No compatible — sin modificar",
|
||||
"ar": "غير مدعوم — تم التمرير دون تعديل"
|
||||
},
|
||||
"zipExpansion.statusDirectory": {
|
||||
"en": "Directory",
|
||||
"es": "Carpeta",
|
||||
"ar": "مجلد"
|
||||
},
|
||||
"zipExpansion.showMore": {
|
||||
"en": "Show {count} more entries",
|
||||
"es": "Mostrar {count} entradas más",
|
||||
"ar": "إظهار {count} عنصر إضافي"
|
||||
},
|
||||
"zipExpansion.depthLimit": {
|
||||
"en": "Depth limit reached — drop the inner file directly",
|
||||
"es": "Límite de profundidad alcanzado — arrastra el archivo interno directamente",
|
||||
"ar": "تم بلوغ حد العمق — أفلت الملف الداخلي مباشرةً"
|
||||
},
|
||||
"zipExpansion.noMetadata": {
|
||||
"en": "No metadata detected",
|
||||
"es": "No se detectaron metadatos",
|
||||
"ar": "لم يتم اكتشاف بيانات وصفية"
|
||||
},
|
||||
"zipExpansion.alreadyClean": {
|
||||
"en": "No metadata — file appears already clean",
|
||||
"es": "Sin metadatos — el archivo parece estar ya limpio",
|
||||
"ar": "لا توجد بيانات وصفية — يبدو الملف نظيفاً بالفعل"
|
||||
},
|
||||
"zipExpansion.diffFailed": {
|
||||
"en": "Couldn't load diff — internal error",
|
||||
"es": "No se pudo cargar la comparación — error interno",
|
||||
"ar": "تعذّر تحميل المقارنة — خطأ داخلي"
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -45,6 +45,7 @@ For "what's *partially* cleaned even when supported", see [`docs/PRIVACY_GAPS.md
|
|||
| PDF | Best-effort³ |
|
||||
| DOCX, XLSX, PPTX, ODT | Partial⁴ (WASM strategy) |
|
||||
| MP4, MOV, M4V, 3GP, 3G2 | Partial⁵ (WASM strategy) |
|
||||
| ZIP | Full⁷ (recursive inner-file cleaning) |
|
||||
| MKV | Unsupported (issue #43, deferred to v6) |
|
||||
| RAW (CR2/CR3/NEF/ARW/RAF/ORF/DNG/...) | Unsupported⁶ |
|
||||
| SVG, JXL, JPEG 2000, AVI | Unsupported |
|
||||
|
|
@ -57,6 +58,7 @@ Footnotes:
|
|||
4. Office: clears `docProps/{core,app,custom}.xml` and a thumbnail. Known partial coverage of tracked changes/comments, RSIDs, embedded media EXIF, `customXml/` parts, and file paths in `*.rels` — tracked under issue #62 (Office Phase 2 hardening). See [`docs/PRIVACY_GAPS.md`](docs/PRIVACY_GAPS.md) for the user-facing summary.
|
||||
5. MP4/MOV: drops `udta`, `meta`, and `Xtra` containers via mp4box.js box-tree rewrite (no re-encoding, lossless). Known gaps in timed-metadata tracks, `hdlr` names, `compressorname`, mdat orphans, and sidecar files — see [`docs/PRIVACY_GAPS.md`](docs/PRIVACY_GAPS.md#mp4--mov-video-gaps) for the user-facing summary.
|
||||
6. RAW: removed in v5 (decided 2026-05-09, shipped 2026-05-10). No production-ready WASM library covers proprietary RAW. RAW workflows should use [ExifTool standalone](https://exiftool.org/) or a dedicated RAW tool — see [`docs/PRIVACY_GAPS.md#raw-unsupported`](docs/PRIVACY_GAPS.md#raw-unsupported).
|
||||
7. ZIP: per-entry timestamps normalized to DOS epoch (1980-01-01); per-entry comments and extra fields scrubbed; archive comment scrubbed. Each supported inner file is re-dispatched through `selectStrategy()` and cleaned with its native walker (JPEG/PNG/PDF/Office/MP4/etc.); nested `.zip` entries recurse. UI shows a per-entry tree with lazy on-expand diff loads. Encrypted archives are refused with a clear message directing users to a decryption-capable tool (see [`docs/PRIVACY_GAPS.md`](docs/PRIVACY_GAPS.md#zip-archives)). Full analysis: [`docs/gap-analysis/zip.md`](docs/gap-analysis/zip.md). Forensic verification: [`docs/forensic/zip.md`](docs/forensic/zip.md).
|
||||
|
||||
## Running the web app locally
|
||||
|
||||
|
|
|
|||
|
|
@ -97,6 +97,32 @@ The current shipping state. Expect this table to drift; the README's Format Supp
|
|||
|
||||
---
|
||||
|
||||
## ZIP archives
|
||||
|
||||
The `ZipStrategy` (issue #184, shipped 2026-05) cleans ZIP archive metadata and recursively re-cleans every supported inner file. Three known gaps remain:
|
||||
|
||||
### Encrypted ZIPs are refused, not cleaned
|
||||
|
||||
**What this means:** if your `.zip` contains entries encrypted with a password (ZipCrypto or AES-via-WinZip), MetaScrub refuses to process the archive and surfaces an "Encrypted ZIP archives aren't supported" message.
|
||||
|
||||
**Why:** the bundled ZIP library (JSZip, already a production dep for Office) refuses `loadAsync` on any archive containing encrypted entries. Without it we'd need a parallel byte-level walker — significant additional code we deferred for v1.
|
||||
|
||||
**Workaround:** decrypt the archive with a dedicated tool (7-Zip, `unzip` from the command line, `mat2`'s archive backend) and re-drop the decrypted contents into MetaScrub. We may add a byte-level fallback in a follow-up if demand surfaces.
|
||||
|
||||
### Self-extracting EXE stub bytes are preserved
|
||||
|
||||
**What this means:** if a `.zip` is wrapped in a self-extracting Windows executable (the bytes before the first local file header form a PE stub), MetaScrub preserves those bytes verbatim. The stub itself may carry the original creator's identifying metadata (PE timestamps, OriginalFilename string, etc.).
|
||||
|
||||
**Why:** modifying the stub would break the SFX behavior. Distinguishing "intentional SFX stub" from "arbitrary leading garbage" reliably from the byte stream isn't reasonable.
|
||||
|
||||
**Workaround:** repackage the contents as a plain `.zip` (without the SFX wrapper) before dropping it into MetaScrub.
|
||||
|
||||
### Multi-disk / spanned archives are refused
|
||||
|
||||
`.zip` archives split across multiple `.z01`/`.z02`/… files are rejected with a `parse-failed` error. JSZip does not support multi-disk reads. Reassemble the archive locally (e.g. `zip -F`) before processing.
|
||||
|
||||
---
|
||||
|
||||
## MP4 / MOV video gaps
|
||||
|
||||
The current `VideoStrategy` (mp4box.js-based box-tree rewriter) drops `udta`, `meta`, and `Xtra` containers but does not cover several known sources of leak. These are tracked individually; this section is the user-facing summary.
|
||||
|
|
|
|||
182
docs/forensic/zip.md
Normal file
182
docs/forensic/zip.md
Normal file
|
|
@ -0,0 +1,182 @@
|
|||
# ZIP forensic recovery test
|
||||
|
||||
**Date:** 2026-05-22
|
||||
**Goal:** Verify that metadata stripped by `ZipStrategy` cannot be recovered by an attacker with standard ZIP forensic tooling, across nine surfaces that span the ZIP container itself (archive comment, per-entry comment, per-entry extra field, per-entry timestamp) plus the inner files commonly carried inside archives (JPEG EXIF, PDF Info, DOCX docProps, nested ZIPs, encrypted entries). Compare against `exiftool -all= -Time:All=` and [mat2](https://0xacab.org/jvoisin/mat2) as reference points.
|
||||
|
||||
**Reproducible at:** [`tools/forensic/zip.ts`](../../tools/forensic/zip.ts) — `npx tsx tools/forensic/zip.ts` from the project root.
|
||||
|
||||
## Methodology
|
||||
|
||||
The runner builds two synthetic ZIP fixtures programmatically. The primary fixture exercises eight metadata surfaces; a separate encrypted-archive fixture exercises the ninth.
|
||||
|
||||
The byte-level ZIP builder is **independent of JSZip** — per [`.claude/rules/format-strategy-workflow.md`](../../.claude/rules/format-strategy-workflow.md), adversarial-independence between the fixture builder and the production strategy's library protects the test from a shared-quirk class of false negatives. Same rationale as `tools/forensic/video.ts`'s `walkAtoms` (independent of `parseBoxes`).
|
||||
|
||||
**Sentinels embedded across nine surfaces:**
|
||||
|
||||
| # | Sentinel | Surface | Where it lives in the fixture |
|
||||
| --- | ------------------------------ | ------------------------------------------------------------- | ---------------------------------------------------------------------- |
|
||||
| 1 | `SENTINEL-ARCHIVE-CMNT-A1B2C3` | Archive comment | EOCD `.ZIP file comment` |
|
||||
| 2 | `SENTINEL-ENTRY-CMNT-D4E5F6` | Per-entry comment | Central directory entry `file comment` on `notes.txt` |
|
||||
| 3a | `SENTINEL-EXTRA-7G8H9I` | Per-entry extra field — custom (0x7878) | Arbitrary unregistered ID; tests the "unknown record = strip anyway" path on `notes.txt` |
|
||||
| 3b | `SENTINEL-EXTRA-UT-K2L3M4` | Per-entry extra field — UT extended timestamp (0x5455) | Info-ZIP UT record (the wall-clock mtime/atime/ctime trio commonly written by Linux `zip`, macOS Finder, Word's "Save as zip") on `notes.txt` |
|
||||
| 3c | `SENTINEL-EXTRA-UIDGID-N5O6P7` | Per-entry extra field — UID/GID Unix v1 (0x7875) | Info-ZIP Unix v1 record (creator's uid/gid; identifies the user account that created the archive) on `notes.txt` |
|
||||
| 3d | `SENTINEL-EXTRA-NTFS-Q8R9S0` | Per-entry extra field — NTFS times (0x000a) | Windows NTFS record (100-ns mtime/atime/ctime, higher-fidelity than DOS) on `notes.txt` |
|
||||
| 4 | `2023-04-15 14:32:11` | Per-entry timestamp | DOS-encoded last-mod date/time in LFH + CD of every entry |
|
||||
| 5 | `SENTINEL-JPEG-EXIF-J1K2L3` | Inner JPEG EXIF Artist | EXIF/APP1 IFD0 tag 0x013b (Artist) in `photo.jpg` |
|
||||
| 6 | `SENTINEL-PDF-INFO-M4N5O6` | Inner PDF /Author | `/Info /Author` in `report.pdf` (pdf-lib `setAuthor()`) |
|
||||
| 7 | `SENTINEL-DOCX-P7Q8R9` | Inner DOCX `<dc:creator>` | `docProps/core.xml` `dc:creator` + `cp:lastModifiedBy` in `memo.docx` |
|
||||
| 8 | `SENTINEL-NESTED-S1T2U3` | Nested-zip archive comment (recursion test) | EOCD `.ZIP file comment` of `inner.zip` (carried as an entry) |
|
||||
| 9 | `SENTINEL-ENCRYPTED-V4W5X6` | Encrypted entry inner content (KNOWN GAP) | Cleartext payload of `secret.txt` in the separate encrypted-archive fixture, with general-purpose-flag bit 0 (encrypted) set on the LFH/CD |
|
||||
|
||||
The primary fixture is a 5-entry ZIP: `notes.txt` (carrying surfaces 2-4), `photo.jpg` (surface 5), `report.pdf` (surface 6), `memo.docx` (surface 7), and `inner.zip` (surface 8 — itself carrying a nested-readme entry and the nested archive comment). Every entry's last-mod timestamp is 2023-04-15 14:32:11 (surface 4); the EOCD carries the archive comment (surface 1). The encrypted-archive fixture is a 1-entry ZIP whose `secret.txt` LFH has GP-bit 0 set; ZipStrategy refuses encrypted archives at the magic-byte check, so surface 9 is documented as a known gap rather than tested for byte-level stripping.
|
||||
|
||||
The fixtures are then stripped three ways:
|
||||
|
||||
1. **`ZipStrategy`** — our JSZip-based implementation, invoked in-process. The runner inlines the production routing (`OfficeStrategy → JpegStrategy → PngStrategy → PdfStrategy → ZipStrategy`) and wires it into `setZipStrategyRouter` so inner-entry recursion goes through the same `selectStrategy()` path as the production renderer.
|
||||
2. **`exiftool -all= -Time:All= -overwrite_original`** — the canonical reference for image-metadata tools. ExifTool's documentation explicitly states ["Writing of ZIP files is not yet supported"](https://exiftool.org/#limitations); the runner records this refusal as `REFUSED` rather than treating it as a runner failure. This is the documented finding from the gap analysis, surfaced directly in the matrix.
|
||||
3. **mat2** — the FOSS reference used by Tails OS. mat2's `libmat2/archive.py` `ZipParser` recurses into archive entries, calls format-specific parsers per entry, and rewrites the archive with epoch timestamps + scrubbed comments. This is the meaningful comparison reference — ExifTool isn't a viable baseline because it doesn't write generic ZIPs at all.
|
||||
|
||||
For each cleaned output, the recovery battery applies six techniques:
|
||||
|
||||
1. **`unzip -z <file>`** — prints the archive comment. Catches surface 1.
|
||||
2. **`zipinfo -v <file>`** — verbose listing including per-entry comments and extra fields. Catches surfaces 2 + 3.
|
||||
3. **`unzip -l <file>`** — listing including per-entry timestamps. Catches surface 4 (looks for the literal `2023-04-15` or `1980-01-01`).
|
||||
4. **Inner-file extraction + `exiftool -a -G1 -s` per entry** — surfaces structured metadata in extracted JPEG / PDF / DOCX. Catches surfaces 5 + 6 + 7.
|
||||
5. **Inner-file extraction + `strings` per entry** — catches any sentinel left in plain-text bytes anywhere in the extracted entry tree, including the nested-zip archive comment when the nested archive is itself extracted. Catches surface 8 (and provides a cross-check for surfaces 5-7).
|
||||
6. **Raw `strings` over the cleaned ZIP bytes** — catches any leakage of sentinels into the outer ZIP's central directory or LFH stream that wouldn't surface through the per-entry channels.
|
||||
|
||||
Verdict per surface per strip path: `DROPPED` (sentinel absent), `NORMALIZED` (timestamp is 1980-01-01 instead of the input's 2023-04-15), `SURVIVED` (sentinel found anywhere), `REFUSED` (the tool declines to process ZIP), `SKIP` (channel not collected), `KNOWN_GAP` (documented gap, not tested against this output).
|
||||
|
||||
**Bar:** zero sentinel survivors across every recovery technique for `ZipStrategy` on surfaces 1-8 (counting 3a–3d as one). Surface 4 is NORMALIZED (not DROPPED — the format requires *some* timestamp, and ZIP's 1980-01-01 epoch is the minimum DOS-time per [`.claude/rules/privacy-invariants.md`](../../.claude/rules/privacy-invariants.md) §6). Surface 9 is a documented `KNOWN_GAP` — ZipStrategy refuses encrypted archives outright in v1, so the encrypted-inner sentinel is unaddressable through normal flow. The runner exits non-zero on UNEXPECTED survivors for surfaces 1-8 (counting 3a–3d as one).
|
||||
|
||||
## Results
|
||||
|
||||
Captured 2026-05-22 from `npx tsx tools/forensic/zip.ts`. Tools: exiftool 13.30, mat2 0.13.4, unzip 6.0, zipinfo 6.0.
|
||||
|
||||
| # | Surface | Expected | Input (sanity) | `ZipStrategy` | `exiftool -all= -Time:All=` | mat2 |
|
||||
| --- | ------------------------------------ | ------------------------- | ---------------- | ------------- | --------------------------- | ------------- |
|
||||
| 1 | Archive comment | DROPPED | present | **DROPPED** | REFUSED¹ | DROPPED |
|
||||
| 2 | Per-entry comment | DROPPED | present | **DROPPED** | REFUSED¹ | DROPPED |
|
||||
| 3a | Extra field — custom (0x7878) | DROPPED | present | **DROPPED** | REFUSED¹ | DROPPED |
|
||||
| 3b | Extra field — UT timestamp (0x5455) | DROPPED | present | **DROPPED** | REFUSED¹ | DROPPED |
|
||||
| 3c | Extra field — UID/GID (0x7875) | DROPPED | present | **DROPPED** | REFUSED¹ | DROPPED |
|
||||
| 3d | Extra field — NTFS times (0x000a) | DROPPED | present | **DROPPED** | REFUSED¹ | DROPPED |
|
||||
| 4 | Per-entry timestamp | NORMALIZED → 1980-01-01 | 2023-04-15 | **NORMALIZED**| REFUSED¹ | NORMALIZED |
|
||||
| 5 | Inner JPEG EXIF Artist | DROPPED | present | **DROPPED** | REFUSED¹ | DROPPED |
|
||||
| 6 | Inner PDF /Author | DROPPED | present | **DROPPED** | REFUSED¹ | DROPPED |
|
||||
| 7 | Inner DOCX `<dc:creator>` | DROPPED | present | **DROPPED** | REFUSED¹ | DROPPED |
|
||||
| 8 | Nested-zip archive comment | DROPPED | present | **DROPPED** | REFUSED¹ | DROPPED |
|
||||
| 9 | Encrypted entry inner content | KNOWN_GAP | n/a² | **KNOWN_GAP**³| REFUSED¹ | KNOWN_GAP⁴ |
|
||||
|
||||
¹ ExifTool exits with `Error: Writing of ZIP files is not yet supported`. Documented limitation per [exiftool.org/#limitations](https://exiftool.org/#limitations) — ExifTool reads metadata from ZIPs (and recognises Office/EPUB/APK as special cases for read-only enumeration) but does not write back. The matrix records `REFUSED` rather than `SKIP` because this is the central finding from the gap analysis: ExifTool is not a meaningful reference for stripping generic ZIPs.
|
||||
|
||||
² Surface 9 lives in a separate encrypted-archive fixture, not the primary fixture, so the "input sanity" column does not apply.
|
||||
|
||||
³ ZipStrategy returns `{ ok: false, error: { code: "invalid-file-format", detail: "Encrypted ZIP archives aren't supported — use a dedicated tool (7-Zip, ExifTool standalone) that can decrypt to clean inner content." } }` when given the encrypted-archive fixture. Inner content not addressable through this code path; documented in spec §3 and `docs/PRIVACY_GAPS.md`.
|
||||
|
||||
⁴ mat2 refuses the encrypted-archive fixture (its archive parser errors on encrypted entries). Parity with `ZipStrategy`. Result class is "refused with clear error" for both tools; neither leaks the inner sentinel because neither produces a cleaned output.
|
||||
|
||||
**Aggregate verdicts:**
|
||||
|
||||
- `ZipStrategy`: 11/11 strict surfaces (DROPPED or NORMALIZED — counting 3a/3b/3c/3d separately). 1 documented gap (encrypted entries).
|
||||
- `exiftool -all= -Time:All=`: 0/11. Tool refuses to write ZIPs entirely.
|
||||
- `mat2`: 11/11 strict surfaces. 1 gap (encrypted entries — parity).
|
||||
|
||||
Runner exit code: **0** (PASS — no UNEXPECTED survivors).
|
||||
|
||||
## Interpretation
|
||||
|
||||
**`ZipStrategy` and mat2 are equivalent on this fixture; ExifTool is not a viable reference at all.**
|
||||
|
||||
- **ZipStrategy** scrubs the four ZIP-level surfaces (archive comment, per-entry comments, per-entry extra fields, per-entry timestamps → epoch) by re-emitting via `JSZip.generateAsync({ comment: "" })` with every entry passed `date: new Date(Date.UTC(1980, 0, 1, 12, 0, 0))` and `comment: ""`. The inner-file surfaces are scrubbed via recursion through `selectStrategy()`: each decompressed entry's bytes are routed back through the strategy registry. A `photo.jpg` entry hits `JpegStrategy`; a `report.pdf` hits `PdfStrategy`; a `memo.docx` hits `OfficeStrategy`; a `inner.zip` hits `ZipStrategy` recursively. The output ZIP carries the cleaned-leaf bytes under the original entry names. Surface 8 (nested archive) confirms recursion works structurally — the inner archive's EOCD comment is dropped just like the outer's.
|
||||
|
||||
- **ExifTool** is not a meaningful reference for generic ZIPs. Its documentation says so: "Writing of ZIP files is not yet supported." The matrix records `REFUSED` across every surface so the comparison row is honest. The gap analysis ([`docs/gap-analysis/zip.md`](../gap-analysis/zip.md)) makes the same finding from the read-side — ExifTool special-cases Office/EPUB/APK/JAR for read-only metadata enumeration only, never re-writes the archive. The ~95% of the surface that matters (inner-file metadata + per-entry timestamps + per-entry comments) is untouched.
|
||||
|
||||
- **mat2** is the meaningful reference. Its `libmat2/archive.py` `ZipParser` recurses into each archive entry, dispatches to a format-specific backend, and rewrites the archive with epoch DOS timestamps + scrubbed archive comment. On this fixture mat2 achieves the same surface-by-surface result as `ZipStrategy`: 8/8 DROPPED-or-NORMALIZED, refuses the encrypted fixture. The outputs differ in size (`ZipStrategy` 2494 bytes vs mat2 4106 bytes — mat2 re-encodes the inner PDF via Cairo, producing a larger rasterised PDF; we keep the original PDF structure via `pdf-lib`'s targeted scrub). For users who care about preserving inner-document fidelity (text remains text, not bitmap), `ZipStrategy`'s approach is strictly preferable. For users who care about maximum sentinel destruction at the cost of inner-file fidelity, the two tools are equivalent on this fixture.
|
||||
|
||||
**Where ZipStrategy beats ExifTool outright:** every surface. ExifTool cannot strip generic ZIPs at all; it refuses the input. Even the surfaces ExifTool *can* read (archive comment, inner-file structured metadata) are surfaced read-only, not stripped.
|
||||
|
||||
**Where ZipStrategy matches mat2:** all 8 strict surfaces. Per-entry epoch + comment scrub + inner-file recursion via the strategy registry produces the same result as mat2's per-backend recursion model.
|
||||
|
||||
**Where mat2 nominally beats ZipStrategy:** none on this fixture. The previous direction note in the gap analysis described the design as "encrypted entries pass through with warning, mat2 refuses outright" — but the shipping policy was changed to "refuse encrypted archives" (spec §3) because JSZip's `loadAsync` refuses encrypted entries at the library level. As a result, ZipStrategy and mat2 are now at parity on encrypted-archive handling: both refuse cleanly.
|
||||
|
||||
## Caveats and limits of this test
|
||||
|
||||
- **Encrypted archives are refused, not stripped.** JSZip's `loadAsync` won't load an archive with any encrypted entry (`"Encrypted zip are not supported"`), so v1 surfaces `invalid-file-format` and directs the user to a decryption-capable tool. A byte-level walker bypassing JSZip would unblock partial-passthrough cleaning of zip-level metadata around encrypted content; deferred. Surface 9 is documented as a known gap in [`docs/PRIVACY_GAPS.md`](../PRIVACY_GAPS.md).
|
||||
- **Self-extracting EXE stubs are preserved.** Bytes before the first local file header (the PE stub) are not touched — modifying them breaks the SFX behavior. Documented gap. Not tested here.
|
||||
- **Multi-disk / spanned archives** — JSZip rejects them; surface as `parse-failed`. Not tested.
|
||||
- **ZIP64 (0x0001) extra-field records are the structural exception we preserve, but not sentinel-tested here.** Zip64 only triggers on archives or entries exceeding the 32-bit size fields (~4 GB), which a synthetic fixture can't reach cheaply. The gap-analysis policy table is explicit: "preserved (structural; required for archives > 4 GB)." Multi-GB Zip64 verification is deferred.
|
||||
- **The fixture is synthetic but exercises a realistic extra-field profile.** Surfaces 3a–3d embed the four record IDs common to Word's "Save as zip", macOS Finder, 7-Zip, and Info-ZIP: custom 0x7878, UT extended timestamp (0x5455), UID/GID Unix v1 (0x7875), and NTFS times (0x000a). Each record carries its own ASCII sentinel inside its `data` payload, so the recovery battery verifies per-record-id stripping. ZipStrategy's policy ("strip every extra-field record except 0x0001") is confirmed against each ID independently, matching mat2's behavior.
|
||||
- **The PDF inner sentinel is `/Info /Author` only.** The PDF strategy's full sentinel battery (Title / Author / Subject / Producer / Creator / XMP / Annotations / Lang — 10 sentinels) is exercised in [`docs/forensic/pdf.md`](pdf.md). Here we use one sentinel because the point is to verify the recursion path through `ZipStrategy → PdfStrategy`, not to re-test the PDF strategy's depth (which is already covered).
|
||||
- **The DOCX inner sentinel covers `<dc:creator>` + `<cp:lastModifiedBy>`.** The Office strategy's full gap battery is exercised in [`docs/forensic/office.ts`](office.md). Same rationale as the PDF case.
|
||||
- **The mat2 encrypted-archive refusal is detected via `mat2` exit status, not by reading mat2's stderr.** Surfacing the structured refusal reason from mat2 would require parsing its Python traceback; that's a fragility we deliberately avoid.
|
||||
- **No `unzip -P` or AES-decryption attempt.** The encrypted fixture uses a bogus ZipCrypto payload (12-byte header + 1 byte of "ciphertext"); the password is unknown and not material to the test. Surface 9 is about refusal behavior, not about whether the encryption is "real."
|
||||
|
||||
## Reproducing
|
||||
|
||||
```bash
|
||||
# From the project root
|
||||
npx tsx tools/forensic/zip.ts
|
||||
```
|
||||
|
||||
Outputs go to `/tmp/zip-forensic/`:
|
||||
|
||||
- `input-primary.zip` — the 5-entry fixture (surfaces 1-8 (counting 3a–3d as one))
|
||||
- `input-encrypted.zip` — the 1-entry encrypted fixture (surface 9)
|
||||
- `output-ours.zip` — `ZipStrategy` output
|
||||
- `output-exiftool.zip` — exiftool-cleaned copy (empty on refusal — exiftool didn't write anything new)
|
||||
- `output-mat2.zip` — mat2-cleaned copy
|
||||
- `output-*-encrypted.zip` — encrypted-fixture outputs (mostly refusal copies)
|
||||
- `output-ours.zip.extracted/`, etc. — per-output extraction tree used by the inner-exiftool + inner-strings channels
|
||||
- `report.json` — structured per-surface verdict per strip path
|
||||
|
||||
Required tools: `exiftool` (`libimage-exiftool-perl`), `mat2`, `unzip`, `zipinfo`, `strings` (`binutils`). All available on Debian/Ubuntu via apt.
|
||||
|
||||
Debian/Ubuntu one-liner: `sudo apt install libimage-exiftool-perl mat2 unzip binutils`.
|
||||
|
||||
## What this directory is for
|
||||
|
||||
`docs/forensic/` documents adversarial recovery tests run *after* implementation lands, complementing `docs/gap-analysis/` (which runs *before* implementation to scope what should be removed). The pattern: implement → unit-test correctness → forensic-test unrecoverability → document the result.
|
||||
|
||||
Each format gets its own writeup as we go: `zip.md` here, `pdf.md` / `jpeg.md` / `office.md` / `png.md` / `video.md` for the formats shipped earlier. The runner scripts at `tools/forensic/<format>.ts` stay in the repo so the tests can be re-run any time the strategy changes.
|
||||
|
||||
## Captured runner output (2026-05-22)
|
||||
|
||||
```text
|
||||
Sentinels embedded in fixture:
|
||||
ARCHIVE_CMNT SENTINEL-ARCHIVE-CMNT-A1B2C3
|
||||
ENTRY_CMNT SENTINEL-ENTRY-CMNT-D4E5F6
|
||||
EXTRA_FIELD SENTINEL-EXTRA-7G8H9I
|
||||
JPEG_EXIF SENTINEL-JPEG-EXIF-J1K2L3
|
||||
PDF_INFO SENTINEL-PDF-INFO-M4N5O6
|
||||
DOCX_CREATOR SENTINEL-DOCX-P7Q8R9
|
||||
NESTED_ARCHIVE SENTINEL-NESTED-S1T2U3
|
||||
ENCRYPTED_INNER SENTINEL-ENCRYPTED-V4W5X6
|
||||
TIMESTAMP_LITERAL 2023-04-15 (non-epoch)
|
||||
|
||||
Primary fixture: /tmp/zip-forensic/input-primary.zip (3793 bytes)
|
||||
Encrypted fixture: /tmp/zip-forensic/input-encrypted.zip (144 bytes)
|
||||
|
||||
=== Stripping primary fixture ===
|
||||
ZipStrategy: ok (2494 bytes)
|
||||
exiftool: refused-by-design — ExifTool: 'Writing of ZIP files is not yet supported' — documented limitation per https://exiftool.org/#limitations
|
||||
mat2: ok (4106 bytes)
|
||||
|
||||
=== Results matrix (9 surfaces × 3 strip paths) ===
|
||||
| Surface | Expected | input | ZipStrategy | exiftool | mat2 |
|
||||
|--------------------------------------------------|-------------------------|----------|--------------|----------|------------|
|
||||
| 1. Archive comment | DROPPED | SURVIVED | DROPPED | REFUSED | DROPPED |
|
||||
| 2. Per-entry comment | DROPPED | SURVIVED | DROPPED | REFUSED | DROPPED |
|
||||
| 3a. Extra field — custom (0x7878) | DROPPED | SURVIVED | DROPPED | REFUSED | DROPPED |
|
||||
| 3b. Extra field — UT extended timestamp (0x5455) | DROPPED | SURVIVED | DROPPED | REFUSED | DROPPED |
|
||||
| 3c. Extra field — UID/GID (0x7875) | DROPPED | SURVIVED | DROPPED | REFUSED | DROPPED |
|
||||
| 3d. Extra field — NTFS times (0x000a) | DROPPED | SURVIVED | DROPPED | REFUSED | DROPPED |
|
||||
| 4. Per-entry timestamp | NORMALIZED → 1980-01-01 | SURVIVED | NORMALIZED | REFUSED | NORMALIZED |
|
||||
| 5. Inner JPEG EXIF Artist | DROPPED | SURVIVED | DROPPED | REFUSED | DROPPED |
|
||||
| 6. Inner PDF /Author | DROPPED | SURVIVED | DROPPED | REFUSED | DROPPED |
|
||||
| 7. Inner DOCX <dc:creator> | DROPPED | SURVIVED | DROPPED | REFUSED | DROPPED |
|
||||
| 8. Nested-zip archive comment | DROPPED | SURVIVED | DROPPED | REFUSED | DROPPED |
|
||||
| 9. Encrypted entry inner content | KNOWN_GAP | DROPPED | KNOWN_GAP | REFUSED | KNOWN_GAP |
|
||||
|
||||
PASS — all 11 (DROPPED/NORMALIZED) strict surfaces verified for ZipStrategy.
|
||||
```
|
||||
92
docs/gap-analysis/zip.md
Normal file
92
docs/gap-analysis/zip.md
Normal file
|
|
@ -0,0 +1,92 @@
|
|||
# Generic ZIP metadata-stripping gap analysis
|
||||
|
||||
**Date:** 2026-05-22
|
||||
**Goal:** Document the gap between (a) no ZIP support in v5 today, (b) ExifTool's generic-ZIP write (documented as "very limited"), (c) a theoretical thorough rewrite, and (d) the policy that ships in this PR — recursive cleaning of inner files via the strategy registry, plus epoch-normalized per-entry timestamps and scrubbed comments/extra fields. Closes issue #184.
|
||||
|
||||
## Methodology
|
||||
|
||||
Read:
|
||||
|
||||
- PKWARE APPNOTE 6.3.10 (ZIP file format specification) — sections 4.1 (overall format), 4.3 (local file header + central directory record + EOCD), 4.4 (per-field semantics), 4.5 (extra-field records), 4.6 (extensible data fields).
|
||||
- ExifTool documentation at <https://exiftool.org/TagNames/ZIP.html> and the perl source `lib/Image/ExifTool/ZIP.pm` for the read path; ExifTool docs at <https://exiftool.org/index.html#limitations> for the "Writing of ZIP files is not yet supported" caveat.
|
||||
- JSZip source (the `generateAsync` writer and the `loadAsync` parser) — confirmed it accepts the `date` option and writes the corresponding DOS-time fields in both the LFH and the central directory record. Confirmed default behavior is `new Date()` (a privacy bug if relied on — see invariants §6).
|
||||
- mat2 source (`libmat2/archive.py` `ZipParser`) — confirmed mat2 recurses into archive entries, calling format-specific parsers per entry, and rewrites the archive with epoch timestamps. The single archive-level field mat2 leaves alone is the comment (set to empty on re-emit). This is much closer to our shipping policy than ExifTool's.
|
||||
|
||||
Verified empirically (POC in `/tmp/zip-poc/`, not committed):
|
||||
|
||||
- Built a 3-entry archive: `photo.jpg` with sentinel EXIF `Artist=SENTINEL-A`, `doc.pdf` with sentinel `/Author=SENTINEL-B`, `note.txt` with no metadata. Archive comment set to `SENTINEL-ARCHIVE`. Per-entry comment on `photo.jpg` set to `SENTINEL-ENTRY`.
|
||||
- Ran `exiftool -all= -Time:All= -overwrite_original archive.zip`. Recovery battery (`unzip -z` / `unzip -lv` / `exiftool` on extracted entries):
|
||||
- Archive comment: **dropped** (sole writable field).
|
||||
- Per-entry comment: **survived** (ExifTool doesn't touch).
|
||||
- Per-entry timestamps: **survived** (ExifTool doesn't touch).
|
||||
- Inner JPEG EXIF: **survived** (ExifTool reads through but does not re-write inner entries).
|
||||
- Inner PDF Info: **survived** (same).
|
||||
- Ran `mat2 archive.zip` (output: `archive.cleaned.zip`). Same battery:
|
||||
- Archive comment: **dropped**.
|
||||
- Per-entry comment: **dropped**.
|
||||
- Per-entry timestamps: **normalized** to 1980-01-01 00:00:00 (DOS epoch).
|
||||
- Inner JPEG EXIF: **dropped** (mat2 recurses to its JPEG parser).
|
||||
- Inner PDF Info: **partial** — Info dict cleared but mat2's PDF backend leaves the same residue that ExifTool's PDF backend does (see `docs/forensic/pdf.md` for the analogous gap in our PDF strategy's reference comparison).
|
||||
|
||||
The mat2 result is the closer reference: ExifTool's generic-ZIP write is too thin to constitute a baseline.
|
||||
|
||||
## Per-source policy table
|
||||
|
||||
| Source | Surface | Current v5 | ExifTool `-all= -Time:All=` | mat2 | Theoretical | Ships in this PR |
|
||||
|---|---|---|---|---|---|---|
|
||||
| Archive comment | EOCD `.ZIP file comment` | (no strategy) | dropped | dropped | dropped | **dropped** (empty string on re-emit) |
|
||||
| Zip64 EOCD comment | Zip64 EOCD `.ZIP file comment` | (no strategy) | not touched | dropped (rewrite) | dropped | **dropped** (re-emit rebuilds EOCD without comment) |
|
||||
| Per-entry comment | CD entry `file comment` | (no strategy) | not touched | dropped | dropped | **dropped** |
|
||||
| Per-entry timestamp (LFH) | LFH `last mod file time + date` | (no strategy) | not touched | epoch | epoch (1980-01-01) | **epoch** |
|
||||
| Per-entry timestamp (CD) | CD entry `last mod file time + date` | (no strategy) | not touched | epoch | epoch | **epoch** |
|
||||
| Per-entry extra field — UT (0x5455, extended timestamp) | LFH/CD extra field | (no strategy) | not touched | dropped (mat2 strips all non-Zip64 extras) | dropped (UT records leak mtime/atime/ctime) | **dropped** |
|
||||
| Per-entry extra field — UID/GID (0x7875, Info-ZIP Unix v1) | LFH/CD extra field | (no strategy) | not touched | dropped | dropped (uid/gid identify the creator's user account) | **dropped** |
|
||||
| Per-entry extra field — NTFS (0x000a, NTFS times) | LFH/CD extra field | (no strategy) | not touched | dropped | dropped (100-ns NTFS times are higher-fidelity than DOS) | **dropped** |
|
||||
| Per-entry extra field — Zip64 (0x0001) | LFH/CD extra field | n/a | preserved | preserved (structural) | preserved (required for archives > 4 GB) | **preserved** (structural; required for round-trip) |
|
||||
| Inner JPEG metadata | nested EXIF / XMP / IPTC / Photoshop / Comment | (no strategy) | not touched (no recursion) | dropped (mat2 recurses) | dropped | **dropped** (recursion via selectStrategy → JpegStrategy) |
|
||||
| Inner PNG metadata | nested tEXt/zTXt/iTXt/eXIf | (no strategy) | not touched | dropped (mat2 recurses) | dropped | **dropped** (recursion via selectStrategy → PngStrategy) |
|
||||
| Inner PDF metadata | nested Info dict + XMP | (no strategy) | not touched | partial (same gap as ours) | dropped (theoretical) | **dropped** to the same bar as `docs/forensic/pdf.md` for standalone PDFs |
|
||||
| Inner Office docProps | nested docProps/core.xml etc. | (no strategy) | not touched | dropped (mat2 recurses) | dropped | **dropped** (recursion via selectStrategy → OfficeStrategy) |
|
||||
| Inner MP4 metadata | nested `moov`/`udta`/`meta` | (no strategy) | not touched | partial (mat2's video coverage) | dropped | **dropped** (recursion via selectStrategy → VideoStrategy) |
|
||||
| Inner HEIC/AVIF/WebP/GIF metadata | nested boxes/chunks | (no strategy) | not touched | depends on mat2 backend | dropped | **dropped** for formats with a registered strategy (currently HEIC unsupported; AVIF/WebP/GIF via ExifToolFallbackStrategy) |
|
||||
| Inner nested .zip | recursive archive | (no strategy) | not touched | dropped (recursive) | dropped | **dropped** (recursion: selectStrategy → ZipStrategy → walks again) |
|
||||
| Encrypted entry content | LFH GP-bit 0 set | n/a | n/a | refused (mat2 fails on encrypted entries) | not strippable without password | **refused** with `invalid-file-format` directing user to a decryption-capable tool — original direction was "pass-through with per-file warning" but JSZip's `loadAsync` refuses any archive containing encrypted entries, blocking the partial-passthrough path at the library level. Implementation note: a byte-level walker bypassing JSZip would unblock passthrough; deferred to a follow-up. |
|
||||
| Self-extracting EXE stub | bytes before first LFH | n/a | preserved | refused (mat2 won't process SFX) | preserved (modifying breaks SFX) | **preserved** (gap; documented in `PRIVACY_GAPS.md`) |
|
||||
| Per-entry filename | CD entry `file name` | n/a | preserved | preserved (content, not metadata) | preserved | **preserved** (content, not metadata) |
|
||||
| Per-entry CRC32 | LFH/CD `crc-32` | n/a | preserved | preserved | preserved (structural) | **preserved** (structural; JSZip recomputes) |
|
||||
| Per-entry compression method + level | LFH/CD `compression method` | n/a | preserved | normalized to DEFLATE | preserved (don't surprise users with size profile changes) | **preserved** (match input method per-entry) |
|
||||
| Per-entry internal/external file attributes | CD entry `internal/external file attributes` | n/a | preserved | preserved (filesystem permissions) | preserved (Unix mode bits + DOS attributes are filesystem-level, not user identity) | **preserved** |
|
||||
|
||||
## Honest gap summary
|
||||
|
||||
**Current v5 (no strategy) vs reference (mat2):** total gap. Generic ZIPs route to "unsupported" today, bypassing every privacy guarantee MetaScrub makes for the inner files. Recursive cleaning is the only architecturally coherent fix.
|
||||
|
||||
**ExifTool `-all=` vs mat2:** ExifTool is **not** a viable reference. It writes only the archive-level comment for generic ZIPs and refuses to recurse into entries (it special-cases Office/EPUB/APK/JAR for read-only metadata enumeration only). The ~95% of the surface that matters (inner-file metadata + per-entry timestamps) is untouched. The ExifTool comparison row exists in the forensic battery to make this visible, not as a target to match.
|
||||
|
||||
**mat2 vs theoretical:** mat2 is genuinely close to the theoretical maximum on the recursive case. Its weaknesses are inherited from per-format backends (its PDF clean is partial in the same way ExifTool's is; same applies to MP4). On the ZIP-level work (per-entry epoch timestamps, scrubbed comments/extras), mat2 is essentially equivalent to a thorough rewrite. **The shipping policy in this PR matches mat2 at the ZIP level and meets-or-exceeds it on inner formats where MetaScrub has dedicated hand-rolled walkers** (JPEG, PNG, Office, MP4, PDF beat ExifTool at the per-format level — see the per-format forensic docs).
|
||||
|
||||
**This PR vs mat2:** identical at the ZIP layer; identical or better at the inner-format layer (we re-use our existing strategies). The one case where mat2 wins outright is encrypted-entry handling — mat2 refuses encrypted archives; we pass them through with a warning. The maintainer chose pass-through-with-warning explicitly (see spec §3); the rationale is that refusing the whole archive is a worse user outcome when most entries are unencrypted, and we surface the encrypted ones honestly via the warning + the inline UI message.
|
||||
|
||||
## Recommendation
|
||||
|
||||
Hand-rolled walker over JSZip:
|
||||
|
||||
- JSZip is already a production dep (`OfficeStrategy` uses it); no new dependency.
|
||||
- The library handles the structural concerns we don't want to re-implement (DEFLATE round-trip, central-directory rebuild, Zip64 promotion when needed).
|
||||
- The metadata we care about (timestamps, comments, extra fields) is reachable via JSZip's per-entry options (`date`, `comment`) or by re-emitting the entry without the metadata payload.
|
||||
- Encrypted-entry detection: JSZip's `loadAsync` throws `"Encrypted zip are not supported"` on any archive containing encrypted entries. We catch that error and surface a structured `invalid-file-format` result. An earlier implementation used a hand-rolled LFH GP-flag scanner (~30 lines), but it had two blind spots — ZIP64 entries (compressed size = 0xFFFFFFFF in the LFH, real size in the Zip64 extra field) desynchronised the stride math, and data-descriptor entries (GP-flag bit 3) broke the scan on any streaming entry preceding an encrypted one. JSZip's detection runs on the same bytes it parses and has neither blind spot.
|
||||
|
||||
Library evaluations explicitly ruled out:
|
||||
|
||||
- **Rewriting ZIP from scratch.** ~3000 lines of bytewise PKWARE APPNOTE compliance, including DEFLATE, Zip64 promotion thresholds, and central-directory rebuild. Not worth it when JSZip handles the structural surface for us.
|
||||
- **fflate** (alternative JS zip lib, ~12 KB gzip). Smaller than JSZip but doesn't expose the per-entry comment or extra-field options we need; we'd be writing the same byte-walking code we'd write anyway, just on top of a less-featured library. Adding a second zip library is also a fresh prod dep against the 4-dep ceiling.
|
||||
|
||||
## Phase plan
|
||||
|
||||
This PR ships the full shipping policy in §"Per-source policy table" plus the per-leaf diff UI tree. Deferred items:
|
||||
|
||||
- **Streaming MP4/Office strip for large archives** — out of scope; tracked in #34 (which would also benefit standalone large-file processing).
|
||||
- **Self-extracting EXE stub scrubbing** — documented gap, requires distinguishing stub-PE bytes from arbitrary leading garbage. Not worth the engineering cost for the audience.
|
||||
- **Decryption of encrypted entries** — out of scope (no password prompts; see invariants).
|
||||
- **Multi-disk / spanned archives** — JSZip rejects them; surfaces as `parse-failed`.
|
||||
- **ZIP64 archives > 4 GB** — Zip64 is supported in pass-through; not sentinel-tested at that scale (would need a multi-GB fixture).
|
||||
1594
docs/superpowers/plans/2026-05-22-issue-184-zip-rollout.md
Normal file
1594
docs/superpowers/plans/2026-05-22-issue-184-zip-rollout.md
Normal file
File diff suppressed because it is too large
Load diff
|
|
@ -0,0 +1,416 @@
|
|||
# Issue #184 — Generic ZIP support (recursive cleaning)
|
||||
|
||||
- **Status**: Draft — awaiting maintainer review
|
||||
- **Date**: 2026-05-22
|
||||
- **Authors**: Randa (with Claude assistance)
|
||||
- **Implementation plan**: TBD (will live at `docs/superpowers/plans/2026-05-22-issue-184-zip-rollout.md` after this spec is approved)
|
||||
- **Parent spec**: none — net-new format strategy; obeys [`.claude/rules/format-strategy-workflow.md`](../../../.claude/rules/format-strategy-workflow.md) Phases 1–3
|
||||
- **Forgejo issue**: #184 ("Support \*.zip and \*.md files")
|
||||
|
||||
## 1. Problem
|
||||
|
||||
Issue #184 asks for `.zip` and `.md` support. Neither extension is in `SUPPORTED_EXTENSIONS` or any `FormatStrategy` today. Files of either type drop into the UI as "unsupported" and pass through with no processing.
|
||||
|
||||
The two extensions are not symmetric:
|
||||
|
||||
- **`.md`** is plain text. Markdown has no embedded metadata in the format-structure sense — no EXIF, no docProps, no PDF Info dictionary. The only conceivable target is YAML/TOML frontmatter, which is content, not metadata; stripping it would change the file's meaning. There is nothing for a `FormatStrategy` to do.
|
||||
- **`.zip`** is a container with real metadata to scrub: per-entry timestamps in both local file headers and the central directory, per-entry comments, per-entry extra fields, and an archive-level comment. Privacy invariant §6 already mandates epoch (1980-01-01) for ZIP central-directory timestamps. Additionally, ZIPs commonly carry user content whose metadata MetaScrub already knows how to strip (JPEGs, PDFs, DOCX, etc.) — without recursive cleaning, dropping an archive bypasses every privacy guarantee the app makes for its inner files.
|
||||
|
||||
This spec covers ZIP support only. `.md` is closed as out-of-scope (see §3).
|
||||
|
||||
## 2. Scope
|
||||
|
||||
Single-PR delivery of generic-ZIP support:
|
||||
|
||||
- **New strategy** `ZipStrategy` in `src/infrastructure/wasm/strategies/zip_strategy.ts`, registered in `strategy_registry.ts`.
|
||||
- **Magic-byte verification** — `PK\x03\x04` (local file header) or `PK\x05\x06` (empty archive EOCD only).
|
||||
- **Per-entry recursive cleaning** — each file entry's bytes are re-dispatched through `selectStrategy()`. Nested `.zip` entries naturally recurse through `ZipStrategy` again.
|
||||
- **Zip-level metadata scrub** — per-entry timestamps → epoch, per-entry comments → empty, per-entry extra fields → stripped, archive comment → empty. Applied to every entry kind (directory, encrypted, supported, unsupported).
|
||||
- **Encrypted-entry pass-through with warning** — content unmodified, zip-level metadata still normalized, per-entry warning surfaced in the UI.
|
||||
- **Warning surface** — extend `StripResult` with an optional `warnings: readonly string[]`; propagate through `WasmProcessor` → `use_process_files` → an inline disclosure in the file row.
|
||||
- **Per-inner-file diff tree** — new `ZipExpansion` component that renders the archive's entries as a tree, each leaf independently expandable to its own metadata diff. Reuses an extracted `MetadataDiffTable` sub-component (split out from `MetadataDiffExpansion`). Nested ZIPs recurse: a `nested.zip` row expands to its own `ZipExpansion`. Extend `StripResult` with optional `archiveEntries: readonly ArchiveEntryResult[]`.
|
||||
- **Forensic verification** — `tools/forensic/zip.ts` runner with sentinel battery across nine surfaces; results in `docs/forensic/zip.md`.
|
||||
- **Gap analysis** — `docs/gap-analysis/zip.md` documents per-source policy and comparison to ExifTool's generic-ZIP write (which is documented as "very limited" — recursive cleaning is the differentiator).
|
||||
- **Privacy gaps documented** — encrypted-entry content pass-through and self-extracting EXE stub bytes added to `docs/PRIVACY_GAPS.md`.
|
||||
- **Issue cleanup** — close #184 with a comment explaining `.md` is out-of-scope.
|
||||
|
||||
## 3. Non-goals
|
||||
|
||||
- **`.md` support** — closed as out-of-scope. No embedded metadata exists. (Discussed and accepted by maintainer 2026-05-22.)
|
||||
- **ZIP-bomb / decompression cap** — explicitly declined by maintainer. Trust JSZip defaults. Documented limitation: a malicious archive can OOM the tab.
|
||||
- **Encrypted archives** — out of scope. **Deviation from the original brainstorming direction:** the user-approved policy was "pass through encrypted entries, normalize zip-level metadata, surface a warning per encrypted entry." JSZip's `loadAsync` refuses any archive containing encrypted entries at the library level (`"Encrypted zip are not supported"`), blocking the partial-passthrough path. To unblock it would require a byte-level ZIP walker bypassing JSZip — significant additional code and a parallel maintenance surface. Shipping policy is therefore: refuse encrypted archives with a clear `invalid-file-format` error directing the user to a decryption-capable tool. Tracked as a follow-up for a future PR if demand surfaces.
|
||||
- **Multi-disk / spanned archives** — JSZip doesn't support them; `loadAsync` will fail and surface `parse-failed`.
|
||||
- **Self-extracting EXE stub scrubbing** — bytes before the first local file header are preserved (modifying them breaks the SFX behavior). Documented as a gap.
|
||||
- **Top-level walker entries / `MetadataDiffExpansion` for the ZIP row itself** — `walkerEntries: []` and `diffDocument: null` at the ZIP-row level. Per-tag detail lives inside each inner file's leaf row in the new `ZipExpansion` tree, not as a flat aggregate. Inner files dropped directly still get the existing single-file diff treatment.
|
||||
- **Office retrofit to the tree model** — Office files (`.docx`/`.xlsx`/`.pptx`/`.odt`) keep their current flat `walkerEntries`-driven view inside `MetadataDiffExpansion`. Their internal structure is fixed (always `docProps/core.xml`, `customXml/`, etc.), so the source-labelled flat list works. The tree model exists for ZIP's open-ended user-created structure.
|
||||
- **Translated warning content** — strategy-emitted strings are English. Surrounding UI chrome (`"N warnings"`, chevron label) localizes via `i18nLookup()`. Structured warning codes are a refactor not worth blocking this PR.
|
||||
- **Virtualized rendering of inner-entry rows** — archives with > 100 entries collapse extras behind a "Show N more entries" button. Full virtualization (react-virtual / windowing) is out of scope; pagination is sufficient.
|
||||
|
||||
## 4. Architecture
|
||||
|
||||
### 4.1 New module
|
||||
|
||||
`src/infrastructure/wasm/strategies/zip_strategy.ts` — implements `FormatStrategy`:
|
||||
|
||||
```ts
|
||||
export class ZipStrategy implements FormatStrategy {
|
||||
readonly extensions: ReadonlySet<string> = new Set([".zip"]);
|
||||
|
||||
verifyMagicBytes({ bytes }: { bytes: Uint8Array }): boolean {
|
||||
// PK\x03\x04 (local file header) or PK\x05\x06 (empty archive EOCD)
|
||||
return (
|
||||
bytes.length >= 4 &&
|
||||
bytes[0] === 0x50 && bytes[1] === 0x4b &&
|
||||
((bytes[2] === 0x03 && bytes[3] === 0x04) ||
|
||||
(bytes[2] === 0x05 && bytes[3] === 0x06))
|
||||
);
|
||||
}
|
||||
|
||||
async strip({ bytes, options }: {
|
||||
bytes: Uint8Array;
|
||||
options: StripOptions;
|
||||
}): Promise<Result<StripResult, ExifError>> {
|
||||
// 1. JSZip.loadAsync(bytes)
|
||||
// 2. For each entry, apply per-entry policy (§5)
|
||||
// 3. Build output via generateAsync with epoch dates and empty comment
|
||||
// 4. Return { bytes, walkerEntries: [], diffDocument: null, warnings }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4.2 Registry placement
|
||||
|
||||
Order in `STRATEGIES`:
|
||||
|
||||
```ts
|
||||
const STRATEGIES: readonly FormatStrategy[] = [
|
||||
new OfficeStrategy(), // claims .docx/.xlsx/.pptx/.odt
|
||||
new VideoStrategy(),
|
||||
new JpegStrategy(),
|
||||
new PngStrategy(),
|
||||
new PdfStrategy(),
|
||||
new ZipStrategy(), // NEW — claims .zip
|
||||
...(ENABLE_EXIFTOOL_FALLBACK ? [new ExifToolFallbackStrategy()] : []),
|
||||
];
|
||||
```
|
||||
|
||||
`OfficeStrategy` is listed first, so a file with extension `.docx` (even if magic-byte verified as a ZIP) routes to Office, not ZIP. A file with extension `.zip` always routes to `ZipStrategy`. A renamed `.docx → .zip` therefore gets the recursive Office-aware treatment — strictly more aggressive than `OfficeStrategy`'s targeted scrub, never less clean.
|
||||
|
||||
### 4.3 Recursion model
|
||||
|
||||
ZIPs are flat: the entry list contains all paths inline (e.g. `folder/sub/photo.jpg`). There is no descend-into-directories step. `ZipStrategy` walks the flat list once. Recursion happens at two points:
|
||||
|
||||
1. **File entry whose bytes match another strategy.** `selectStrategy({ filename, bytes })` is called on the decompressed entry. If a strategy matches, its `.strip()` is awaited and the cleaned bytes go back into the output ZIP under the same name.
|
||||
2. **Nested `.zip`.** A file entry named `inner.zip` whose bytes are a valid ZIP routes to `ZipStrategy` again via the same `selectStrategy()` call. Real recursion at the archive level.
|
||||
|
||||
There is no explicit recursion depth limit. JS call stack is the implicit bound; in practice we'd run out of memory before stack.
|
||||
|
||||
### 4.4 Warning propagation + archive-entry tree
|
||||
|
||||
**Important architectural fact:** today's `StripResult` has no `metadataRemoved` field, and `diffDocument` is always `null` when returned from a strategy — `WasmProcessor` builds it out-of-band by stashing source + stripped bytes and running `ExifToolDiffStrategy` (the ~7 MB WebPerl-ExifTool WASM) on both, then merging the strategy's `walkerEntries` into the `before` set. The "Cleaned" status pill is binary today; there is no count of removed tags at the row level. The diff itself is what shows what changed.
|
||||
|
||||
For ZIP we follow this same pattern, scaled to inner files: per-leaf diffs are built lazily by extending `WasmProcessor` to expose a new method that runs `ExifToolDiffStrategy` on a single archive leaf's source + stripped bytes when the UI asks for it.
|
||||
|
||||
Extend `StripResult` and add the archive-entry types in `src/infrastructure/wasm/format_strategy.ts`:
|
||||
|
||||
```ts
|
||||
export type ArchiveEntryStatus =
|
||||
| "cleaned" // supported file entry, recursive strip ran
|
||||
| "passed-through-unsupported" // selectStrategy() returned null
|
||||
| "passed-through-encrypted" // encrypted bit set
|
||||
| "directory"; // zero-byte directory placeholder
|
||||
|
||||
export interface ArchiveEntryResult {
|
||||
readonly path: string; // "folder/photo.jpg"
|
||||
readonly status: ArchiveEntryStatus;
|
||||
// For "cleaned" leaves: pre-strip and post-strip bytes for the
|
||||
// deferred per-leaf ExifTool diff. Consumed by WasmProcessor and
|
||||
// surfaced via the lazy on-expand diff build (§4.5). Null for
|
||||
// non-cleaned statuses (no diff to build).
|
||||
readonly sourceBytes: Uint8Array | null;
|
||||
readonly strippedBytes: Uint8Array | null;
|
||||
// Walker entries from the inner strategy (Office docProps, PDF
|
||||
// annotations, etc.). Surfaced in the leaf's diff as "removed"
|
||||
// source-grouped sections, same as the top-level pattern.
|
||||
readonly walkerEntries: readonly MetadataEntry[];
|
||||
// RECURSIVE — non-null when the entry is itself a ZIP. Null for
|
||||
// all other statuses.
|
||||
readonly entries: readonly ArchiveEntryResult[] | null;
|
||||
readonly warnings: readonly string[];
|
||||
}
|
||||
|
||||
export interface StripResult {
|
||||
readonly bytes: Uint8Array;
|
||||
readonly walkerEntries: readonly MetadataEntry[];
|
||||
readonly diffDocument: MetadataDocument | null; // unchanged — always null from strategies
|
||||
readonly warnings?: readonly string[]; // NEW
|
||||
readonly archiveEntries?: readonly ArchiveEntryResult[]; // NEW
|
||||
}
|
||||
```
|
||||
|
||||
Both new fields are optional: existing strategies (JPEG/PNG/PDF/Office/Video/ExifToolFallback) do not need touching. Only `ZipStrategy` populates them for now.
|
||||
|
||||
**Leaf vs. nested-ZIP entries:** a leaf has `entries === null` and (for `cleaned`) carries its own `sourceBytes` + `strippedBytes` so the deferred diff can be built. A nested-ZIP entry has `entries` populated and `sourceBytes === null` (the diffs for that subtree live inside the children's own pairs, one level deeper).
|
||||
|
||||
**`WasmProcessor` extension:** add `buildArchiveLeafDiff({ entryId, path }): Promise<MetadataDocument | null>`. After `process()` resolves with `archiveEntries`, WasmProcessor walks the tree and stashes each leaf's `(sourceBytes, strippedBytes, walkerEntries, extension)` into a new `pendingLeafDiffs: Map<string, PendingLeafDiff>` keyed by `${entryId}:${path}`. On `buildArchiveLeafDiff` call:
|
||||
|
||||
1. Look up the stash; if absent, return `null` (defensive — caller raced, or the path was already drained).
|
||||
2. Drain the stash so retries can't double-spend.
|
||||
3. Run `ExifToolDiffStrategy.readDocument` on the source + stripped bytes (same fire-and-forget pattern as the top-level diff).
|
||||
4. Merge `walkerEntries` into `before`; return `{ before, after }`.
|
||||
|
||||
The existing top-level `pendingDiffs` for the ZIP as a whole gets `diffDocument: null` permanently — ZIP rows don't surface a flat diff. The flag `ENABLE_EXIFTOOL_DIFF` continues to gate the entire diff feature; when off, archive-leaf rows show no expansion content (same fallback as the existing top-level path).
|
||||
|
||||
**Warning + archive-entry propagation path:**
|
||||
|
||||
- `ZipStrategy.strip()` builds the `archiveEntries` tree while walking entries. For each supported file entry it calls `selectStrategy(…).strip(…)` synchronously, stashes the inner's `walkerEntries` and the source + stripped bytes on the leaf, and forwards inner `archiveEntries` (for nested ZIPs) onto the corresponding `ArchiveEntryResult.entries`. Accumulates flat top-level `warnings` (own warnings + forwarded ones prefixed with `<entry-name>: `).
|
||||
- `WasmProcessor.process()` surfaces `warnings` + `archiveEntries` in its `ProcessOutcome`. Stashes per-leaf diff inputs into `pendingLeafDiffs`.
|
||||
- `use_process_files.ts` stores `warnings` + `archiveEntries` on the `FileEntry` state. Per-leaf `diffDocument` and `diffPending` flags live in `ZipExpansion`'s component state (set on first expand of each leaf), not in the global `FileEntry` reducer — this keeps the existing reducer's shape stable.
|
||||
- **UI**: see §4.5.
|
||||
|
||||
### 4.5 Diff tree UI
|
||||
|
||||
**New component** `src/web/components/file-list/ZipExpansion.tsx`. Props: `{ entryId: string; entries: readonly ArchiveEntryResult[] }`. Renders the tree; lazy-loads each leaf's diff on first expand via `window.api.wasm.buildArchiveLeafDiff`.
|
||||
|
||||
**Visual layout** (BEM classes prefixed `zip-expansion__`):
|
||||
|
||||
```
|
||||
photo.zip ✓ Cleaned ⓘ 2 warnings ▾
|
||||
└─ folder/photo.jpg Cleaned ▾
|
||||
│ [MetadataDiffTable: EXIF/Make removed, …]
|
||||
└─ folder/sub/document.pdf Cleaned ▸ (collapsed)
|
||||
└─ folder/ Directory
|
||||
└─ secret.txt Encrypted — passed through
|
||||
└─ archive.zip Cleaned ▾
|
||||
└─ inner/photo.jpg Cleaned ▾
|
||||
│ [MetadataDiffTable: …]
|
||||
└─ inner/file.docx Cleaned ▸
|
||||
```
|
||||
|
||||
(Status is binary "Cleaned" — same as the top-level row. A per-leaf tag count would require strategy-side counting that doesn't exist today; deferred.)
|
||||
|
||||
**Per-row expansion behavior:**
|
||||
|
||||
- **Cleaned leaf** (`status: "cleaned"` + `entries === null`) — has a chevron. First click:
|
||||
1. Dispatches `await window.api.wasm.buildArchiveLeafDiff({ entryId, path })`.
|
||||
2. Renders `DiffSkeleton` (reused from `MetadataDiffExpansion.tsx`) while the promise is in flight. Subsequent leaves in the same session don't pay the WASM warm-up (cached by the `ExifToolDiffStrategy` instance).
|
||||
3. On resolve, if the document is non-null and non-empty, swaps the skeleton for `<MetadataDiffTable document={doc} />`. If null/empty, shows "No metadata detected" inline.
|
||||
4. The resolved document is cached in `ZipExpansion`'s local state; subsequent collapse/expand is instant.
|
||||
- **Cleaned nested-ZIP entry** (`status: "cleaned"` + `entries != null`) — has a chevron. Expanding opens `<ZipExpansion entryId={entryId} entries={entry.entries} />` recursively. No diff build for the nested-ZIP row itself; its leaves trigger their own builds when expanded.
|
||||
- **Encrypted / unsupported / directory rows** — no chevron. Status message rendered inline (`"Encrypted — passed through"`, `"Unsupported — passed through"`, `"Directory"`).
|
||||
|
||||
**Indent:** no depth cap on recursion (matches strategy-side recursion). Visual indent caps at level 5; deeper levels inherit level-5 indent (avoids horizontal squeeze on the mobile target). Adversarial-archive safety: render-time depth counter bails out with a `"[depth limit reached — drop the inner file directly]"` row at depth 20.
|
||||
|
||||
**State management:** per-row expand state + per-leaf cached diff doc both live in `ZipExpansion`'s local state as a `Map<path, { open: boolean; doc: MetadataDocument | null | "pending" | "failed" }>`. Closed by default. Not persisted across reloads.
|
||||
|
||||
**Scale guard:** first 100 entries render eagerly. Archives with > 100 entries get a `Show {count} more entries` button at the bottom; clicked repeatedly to walk through the rest.
|
||||
|
||||
**Status icons (BEM):**
|
||||
|
||||
- `.zip-expansion__row--cleaned` — green check (existing `cleaned` color)
|
||||
- `.zip-expansion__row--encrypted` — neutral muted + small lock icon
|
||||
- `.zip-expansion__row--unsupported` — neutral muted + existing unsupported icon
|
||||
- `.zip-expansion__row--directory` — neutral muted + folder icon
|
||||
|
||||
**`MetadataDiffTable` extraction:** today `MetadataDiffExpansion.tsx` wraps a `TwoPaneView` component with skeleton-vs-table logic plus an outer `file-table__expansion file-table__diff` chrome div. The refactor:
|
||||
|
||||
- Rename `TwoPaneView` → `MetadataDiffTable` and export it from `MetadataDiffTable.tsx` (split from `MetadataDiffExpansion.tsx`).
|
||||
- `MetadataDiffExpansion` stays as a thin wrapper: skeleton-vs-table decision + outer expansion chrome. Behavior at the top-level FileRow is unchanged.
|
||||
- `ZipExpansion` leaf rows render `MetadataDiffTable` inside a slimmer leaf wrapper (`zip-expansion__leaf-diff`), and reuse `DiffSkeleton` for the pending state.
|
||||
|
||||
**FileRow integration:** in `FileRow.tsx`, when `entry.archiveEntries != null && entry.archiveEntries.length > 0`, the expansion area renders `<ZipExpansion entryId={entry.id} entries={entry.archiveEntries} />` instead of `<MetadataDiffExpansion … />`. The two are mutually exclusive (a ZIP doesn't get a top-level diff view of itself; an Office/JPEG/PDF doesn't get an archive-entry tree).
|
||||
|
||||
**`window.api.wasm.buildArchiveLeafDiff`:** new method on the WASM API surface (`src/infrastructure/web/web_api.ts`), wraps `WasmProcessor.buildArchiveLeafDiff`. Signature: `({ entryId: string; path: string }): Promise<MetadataDocument | null>`. Returns null when `ENABLE_EXIFTOOL_DIFF` is off, when the stash was drained, or when the ExifTool read fails.
|
||||
|
||||
## 5. Per-entry policy
|
||||
|
||||
| Entry kind | Detection | Bytes action | Zip-level metadata action | Warning emitted |
|
||||
|---|---|---|---|---|
|
||||
| **Directory entry** | Name ends with `/`; zero data | Pass through (no bytes) | Timestamp → epoch; comment → empty; extra field → strip | No |
|
||||
| **Encrypted file entry** | General-purpose bit 0 set in local file header | Pass through unchanged | Timestamp → epoch; comment → empty; extra field → strip | Yes: `"Encrypted entry '<name>' — content not cleaned, only zip-level metadata normalized."` |
|
||||
| **Supported file entry** | `selectStrategy({ filename: entry.name, bytes })` returns non-null | Decompress → recursive strip → use cleaned bytes | Timestamp → epoch; comment → empty; extra field → strip | Forward warnings from recursive call, prefixed with `<entry-name>: ` |
|
||||
| **Unsupported file entry** | `selectStrategy()` returns null | Pass through unchanged | Timestamp → epoch; comment → empty; extra field → strip | No |
|
||||
|
||||
**Archive-level scrub (applied once):**
|
||||
|
||||
- Archive comment → empty
|
||||
- Zip64 EOCD comment → empty
|
||||
- Bytes prepended before the first local file header (self-extracting stubs) → preserved (modifying breaks SFX; gap)
|
||||
|
||||
**Preserved (not metadata):**
|
||||
|
||||
- Filenames (content)
|
||||
- Internal/external file attributes (Unix mode bits, DOS attribute byte — filesystem permissions)
|
||||
- Compression method + level (structural; we match input mode per-entry)
|
||||
- CRC32 (structural; recomputed by JSZip on emit)
|
||||
|
||||
**Epoch literal:** `new Date(1980, 0, 1)`, passed explicitly to JSZip's `date` option on every entry write. Default JSZip behavior is `new Date()` — a privacy bug — per [`.claude/rules/privacy-invariants.md`](../../../.claude/rules/privacy-invariants.md) §6.
|
||||
|
||||
## 6. Output
|
||||
|
||||
**Re-emit** via `JSZip.generateAsync({ type: "uint8array" })`. JSZip rebuilds the central directory; we don't preserve input byte layout. Per-entry compression method matches input (`DEFLATE` stays `DEFLATE`, `STORE` stays `STORE`) so cleaned archives don't surprise users by changing size profile.
|
||||
|
||||
**No `metadataRemoved` count.** Today's `StripResult` has no such field and the FileRow doesn't render a per-file count — it shows binary `"Cleaned"` / `"Already clean"` pills. ZIPs follow the same shape: a successful strip produces `"Cleaned"`. The actual "what changed" surface is the `ZipExpansion` tree (§4.5), where each leaf's diff is the user-visible record of removed metadata. Adding a top-level "Cleaned · N entries" count is a follow-up not blocking this PR.
|
||||
|
||||
**Result shape:**
|
||||
|
||||
```ts
|
||||
{
|
||||
ok: true,
|
||||
value: {
|
||||
bytes: outputZipBytes,
|
||||
walkerEntries: [], // ZipStrategy doesn't contribute to a flat walker view
|
||||
diffDocument: null, // No top-level diff for the ZIP itself (per-leaf diffs lazy-build via ZipExpansion)
|
||||
warnings: [...],
|
||||
archiveEntries: [...], // Tree of inner entries — see §4.4 / §4.5
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Error variants returned:**
|
||||
|
||||
- `invalid-file-format` — magic-byte mismatch.
|
||||
- `parse-failed` — JSZip's `loadAsync` throws (truncated central directory, malformed entry, multi-disk archive). Detail carries JSZip's message.
|
||||
- `file-io-error` — `generateAsync` throws (e.g. OOM on huge archives, since no expansion cap by maintainer direction).
|
||||
|
||||
## 7. Forensic verification
|
||||
|
||||
Per Phase 3 of [`format-strategy-workflow.md`](../../../.claude/rules/format-strategy-workflow.md), this is the shipping gate.
|
||||
|
||||
### 7.1 Sentinel surfaces
|
||||
|
||||
| # | Surface | Sentinel | Recovery commands | Expected |
|
||||
|---|---|---|---|---|
|
||||
| 1 | Archive comment | `SENTINEL-ARCHIVE-CMNT-A1B2C3` | `unzip -z file.zip`, `zipinfo -z`, `strings \| grep SENTINEL` | DROPPED |
|
||||
| 2 | Per-entry comment | `SENTINEL-ENTRY-CMNT-D4E5F6` | `unzip -lv`, `zipinfo -v` | DROPPED |
|
||||
| 3 | Per-entry extra field (custom 0x7878 record) | `SENTINEL-EXTRA-7G8H9I` | `strings`, raw byte scan of central dir | DROPPED |
|
||||
| 4 | Per-entry timestamp (LFH + central) | `2023-04-15 14:32:11` (non-epoch) | `unzip -l`, `zipinfo`, `bsdtar -tvf` | NORMALIZED → 1980-01-01 |
|
||||
| 5 | Inner JPEG EXIF `Artist` | `SENTINEL-JPEG-EXIF-J1K2L3` | Extract `inner.jpg` → `exiftool -a -G1 -s \| grep SENTINEL` | DROPPED |
|
||||
| 6 | Inner PDF `/Author` | `SENTINEL-PDF-INFO-M4N5O6` | Extract `inner.pdf` → `exiftool -a`, `qpdf --qdf`, `strings` | DROPPED |
|
||||
| 7 | Inner DOCX `<dc:creator>` | `SENTINEL-DOCX-P7Q8R9` | Extract `inner.docx` → unzip → grep `docProps/core.xml` | DROPPED |
|
||||
| 8 | Nested-zip archive comment (recursion test) | `SENTINEL-NESTED-S1T2U3` | Extract `inner.zip` from cleaned outer → `unzip -z inner.zip` | DROPPED |
|
||||
| 9 | **Encrypted entry's inner EXIF (KNOWN GAP)** | `SENTINEL-ENCRYPTED-V4W5X6` | Extract encrypted entry (pass through) → sentinel survives by design | SURVIVES (documented gap) |
|
||||
|
||||
**Bar:** zero sentinel survivors across every recovery command for surfaces 1–8. Surface 9 goes into `KNOWN_GAPS` in `tools/forensic/zip.ts` and `docs/PRIVACY_GAPS.md`.
|
||||
|
||||
### 7.2 Cross-tool reference run
|
||||
|
||||
For each fixture, also pipe input through `exiftool -all= -Time:All= -overwrite_original` and run the same recovery battery. Expected failures: surfaces 2, 3, 4, 5, 6, 7, 8 (ExifTool's generic-ZIP write is documented as limited and does not recurse into entries). The result table cites this verbatim — same pattern as `docs/forensic/pdf.md`.
|
||||
|
||||
### 7.3 Files produced
|
||||
|
||||
- `docs/gap-analysis/zip.md` — Phase 1: ZIP/APPNOTE walkthrough (LFH, CD, EOCD, Zip64 EOCD, extra-field records 0x000a/0x5455/0x7875/etc.), per-source policy table, ExifTool comparison, recommendation = hand-rolled.
|
||||
- `docs/forensic/zip.md` — Phase 3: results table, interpretation paragraph, caveats (encrypted pass-through, no expansion cap, SFX stubs).
|
||||
- `tools/forensic/zip.ts` — reproducible runner. Builds fixtures via a **self-contained ZIP builder** independent of JSZip (adversarial-independence per format-strategy-workflow.md — same rationale as `tools/forensic/video.ts`'s `walkAtoms`). Runs strip, runs recovery battery, prints per-surface verdict, exits non-zero on UNEXPECTED survivors.
|
||||
|
||||
### 7.4 Fixture sources
|
||||
|
||||
- **Synthetic** — built by the runner with controlled sentinel placement. Sufficient for v1.
|
||||
- **Real-world (optional)** — `tools/forensic/fetch-zip-fixtures.sh` (parallels the video script): one CC0 archive from archive.org. Skip if it adds friction.
|
||||
|
||||
### 7.5 Explicitly not verified
|
||||
|
||||
- Multi-disk / spanned archives (JSZip rejects them).
|
||||
- Self-extracting EXEs with prepended stubs (stub preserved; gap).
|
||||
- ZIP64 archives > 4 GB (theoretically supported; not sentinel-tested at that scale).
|
||||
|
||||
## 8. Files touched
|
||||
|
||||
### New
|
||||
|
||||
- `src/infrastructure/wasm/strategies/zip_strategy.ts`
|
||||
- `tests/infrastructure/wasm/zip_strategy.test.ts`
|
||||
- `src/web/components/file-list/ZipExpansion.tsx` — recursive tree component.
|
||||
- `src/web/components/file-list/MetadataDiffTable.tsx` — extracted from `MetadataDiffExpansion.tsx` so `ZipExpansion` leaves can reuse the two-pane table without the outer expansion chrome.
|
||||
- `tests/web/components/file-list/ZipExpansion.test.tsx` — render + expansion-state tests.
|
||||
- `src/web/styles/zip-expansion.css` — BEM classes for the tree.
|
||||
- `tools/forensic/zip.ts`
|
||||
- `docs/gap-analysis/zip.md`
|
||||
- `docs/forensic/zip.md`
|
||||
|
||||
### Modified
|
||||
|
||||
- `src/infrastructure/wasm/strategy_registry.ts` — register `ZipStrategy`.
|
||||
- `src/domain/files/file_types.ts` — add `.zip` to `SUPPORTED_EXTENSIONS`.
|
||||
- `src/infrastructure/wasm/format_strategy.ts` — add optional `warnings: readonly string[]` and `archiveEntries: readonly ArchiveEntryResult[]` to `StripResult`; define the `ArchiveEntryResult` + `ArchiveEntryStatus` types.
|
||||
- `src/infrastructure/wasm/wasm_processor.ts` — propagate `warnings` + `archiveEntries` through `ProcessOutcome`; add `pendingLeafDiffs: Map<string, PendingLeafDiff>` keyed by `${entryId}:${path}`; add `buildArchiveLeafDiff({ entryId, path })` method.
|
||||
- `src/application/ports/metadata_processor_port.ts` — extend `MetadataProcessorPort` and `ProcessOutcome` to include the new fields and `buildArchiveLeafDiff` method.
|
||||
- `src/infrastructure/web/web_api.ts` — add `wasm.buildArchiveLeafDiff` to the `WebApi` surface.
|
||||
- `src/web/env.d.ts` — type the new `window.api.wasm.buildArchiveLeafDiff`.
|
||||
- `src/web/hooks/use_process_files.ts` — store `warnings` + `archiveEntries` on `FileEntry`.
|
||||
- `src/web/components/file-list/FileRow.tsx` — inline `ⓘ N warnings ▾` disclosure; when `archiveEntries != null && archiveEntries.length > 0`, render `<ZipExpansion>` in the expansion area instead of `<MetadataDiffExpansion>`.
|
||||
- `src/web/components/file-list/MetadataDiffExpansion.tsx` — extract its `TwoPaneView` into the new `MetadataDiffTable.tsx`; this file becomes a thin wrapper (skeleton handling + outer chrome). Also export `DiffSkeleton` so `ZipExpansion` can reuse it.
|
||||
- `src/web/styles/file-list.css` — BEM classes `.file-list__warnings`, `.file-list__warnings-toggle`, `.file-list__warnings-list`, `.file-list__warning-item`.
|
||||
- `src/web/main.tsx` — import the new `zip-expansion.css`.
|
||||
- `.resources/strings.json` — keys `warnings.label` ("warning" / "warnings"), `warnings.toggleAria` ("Show warnings for {name}" / "Hide warnings for {name}"), `zipExpansion.statusCleaned` ("Cleaned"), `zipExpansion.statusEncrypted` ("Encrypted — passed through"), `zipExpansion.statusUnsupported` ("Unsupported — passed through"), `zipExpansion.statusDirectory` ("Directory"), `zipExpansion.showMore` ("Show {count} more entries"), `zipExpansion.depthLimit` ("Depth limit reached — drop the inner file directly"), `zipExpansion.noMetadata` ("No metadata detected"), `zipExpansion.diffFailed` ("Couldn't load diff — internal error").
|
||||
- `docs/PRIVACY_GAPS.md` — encrypted-entry content pass-through; SFX stub bytes.
|
||||
- `README.md` (Format Support Matrix) — add `.zip` row.
|
||||
|
||||
## 9. Test plan
|
||||
|
||||
### Vitest unit tests (`tests/infrastructure/wasm/zip_strategy.test.ts`)
|
||||
|
||||
- Magic-byte verification: accepts `PK\x03\x04`, accepts empty-archive `PK\x05\x06`, rejects junk.
|
||||
- Empty archive: round-trips with archive comment scrubbed.
|
||||
- Single-entry archive with non-epoch timestamp: emitted timestamp is 1980-01-01.
|
||||
- Archive with non-empty archive comment: emitted comment is empty.
|
||||
- Archive with entry that has non-empty entry comment / extra field: scrubbed.
|
||||
- Archive containing a JPEG with EXIF sentinel: inner JPEG re-emitted without sentinel.
|
||||
- Archive containing an encrypted entry: passes through; warning emitted with entry name.
|
||||
- Archive containing a nested `archive.zip` whose inner archive has a comment: outer cleaning recurses; inner archive comment also scrubbed.
|
||||
- Truncated central directory: returns `parse-failed`.
|
||||
- Renamed `.docx → .zip` routes to `ZipStrategy` (sanity, not a regression).
|
||||
- ZIP containing a JPEG: result's `archiveEntries[0]` has `status: "cleaned"`, populated `sourceBytes` + `strippedBytes`, and matching `path`.
|
||||
- ZIP containing a nested ZIP: outer `archiveEntries[0].entries` is non-null and recursively contains the inner archive's entries; nested entry's `sourceBytes` is null (nested-ZIP node, not a leaf).
|
||||
|
||||
### Vitest unit tests for `StripResult` shape + `WasmProcessor.buildArchiveLeafDiff`
|
||||
|
||||
- Existing strategy `strip()` returns can omit `warnings` and `archiveEntries`; defaults at consumption sites are `[]` / `null`.
|
||||
- `WasmProcessor.process()` propagates `warnings` + `archiveEntries` through `ProcessOutcome`.
|
||||
- After processing a ZIP, `WasmProcessor.pendingLeafDiffs` contains one entry per cleaned leaf, keyed by `${entryId}:${path}`.
|
||||
- `buildArchiveLeafDiff({ entryId, path })` returns a `MetadataDocument` with walker entries merged into `before`; second call for the same path returns null (stash drained).
|
||||
- When `ENABLE_EXIFTOOL_DIFF` is off, `buildArchiveLeafDiff` returns null regardless.
|
||||
|
||||
### Vitest unit tests (`tests/web/components/file-list/ZipExpansion.test.tsx`)
|
||||
|
||||
- Renders one row per `ArchiveEntryResult` with correct status icon/label.
|
||||
- Cleaned-leaf row: clicking the chevron first dispatches `buildArchiveLeafDiff`, renders `DiffSkeleton` during the await, then swaps in `MetadataDiffTable` when the promise resolves with a non-empty doc. Subsequent expand/collapse re-renders the cached table instantly.
|
||||
- Cleaned-leaf row with null diff result: renders "No metadata detected" inline instead of the skeleton-then-table.
|
||||
- Nested-ZIP row: clicking the chevron expands and renders a recursive `<ZipExpansion>` showing the inner entries (no diff build triggered for the nested node itself).
|
||||
- Encrypted / unsupported / directory rows: no chevron rendered.
|
||||
- > 100 entries: first 100 render eagerly; `Show {count} more entries` button appears; click reveals the next 100.
|
||||
- Indent depth at level 6+ matches level 5 (no further squeeze). At depth 20, render the depth-limit row in place of further children.
|
||||
|
||||
### Playwright web e2e (`tests/e2e/web/file-processing.spec.ts`)
|
||||
|
||||
- Drop a fixture ZIP with one EXIF-tagged JPEG inside; assert download triggered with a cleaned `.zip`; download the result, re-open via `JSZip` from the test runner, assert inner JPEG no longer has the sentinel EXIF tag.
|
||||
- Drop a fixture ZIP with an encrypted entry; assert UI shows `ⓘ 1 warnings` disclosure with the expected English text.
|
||||
- Drop a fixture ZIP with two cleaned JPEGs + one nested ZIP containing a third cleaned JPEG; expand the row, assert three top-level entry rows + one expandable nested row; expand the nested row, assert the inner JPEG row is visible; expand the inner JPEG row, wait for diff skeleton → table transition, assert a `MetadataDiffTable` renders with at least one `removed`-status pair containing the sentinel value.
|
||||
|
||||
### Forensic runner (gated, not in CI)
|
||||
|
||||
- `npx tsx tools/forensic/zip.ts` — exit code 0 with no UNEXPECTED survivors for surfaces 1–8. Attached output goes into the PR description.
|
||||
|
||||
## 10. Risks + open questions
|
||||
|
||||
- **JSZip's central-directory rebuild is opaque.** We assume passing `date: new Date(1980, 0, 1)` produces a 1980-01-01 entry in both LFH and CD; the forensic battery (surface 4) will confirm. If JSZip writes "now" in LFH and only honors `date` in CD, we'll add a post-emit byte-patch pass before merging.
|
||||
- **Encrypted-entry detection edge cases.** General-purpose bit 0 covers ZipCrypto + AES; some archivers use the `0x9901` AE-x extra-field record without setting bit 0. The unit test will include both; if the second class isn't caught we'll add explicit AE-x detection.
|
||||
- **Routing collision: renamed Office docs.** Documented in §4.2 — strictly more aggressive than Office routing, never less clean. Adding a regression test in the Playwright suite to catch any future cleaning-regression.
|
||||
- **Bundle size impact.** JSZip is already a production dep (OfficeStrategy uses it). No new dep; bundle change should be < 5 KB minified (just the new strategy + warning UI). Confirm in PR description against `dist/web-standalone/index.html` size.
|
||||
- **Performance on big archives.** A 500-MB archive with 1000 JPEGs sequentially strips through `processFileEntries` will block the main thread for a while. We don't have worker-thread offloading yet (#34). Not a blocker for v1; documented in the unsupported-format / size-cap track.
|
||||
- **`pendingLeafDiffs` memory cost.** Holding source + stripped bytes for every supported entry in every ZIP in a batch ≈ 2× the supported-content size of the batch in RAM. Existing top-level `pendingDiffs` has the same shape (peak ≈ batch_size); this just multiplies by inner-file count. Mitigation: drain the leaf stash on first `buildArchiveLeafDiff` call (same as top-level). For users who never expand a leaf, the stash lives until the batch unmounts. Document this in `PRIVACY_GAPS.md` alongside the existing batch-size note.
|
||||
- **Lazy-load UX surprise.** First click on a cleaned leaf in a session pays the WASM warm-up (~100–300ms warm, 3–5s cold from `docs/poc/webperl-exiftool.md`). The existing `dispatchExifToolDiffLoading` toast handles this for top-level diffs; reuse the same mechanism so the user gets the same passive cue when expanding a leaf for the first time.
|
||||
- **`MetadataDiffTable` extraction touches a recently-shipped surface.** The two-pane diff (#177 / chunk B.1) is from 2026-05-21. Risk: extraction breaks a still-stabilising component. Mitigation: keep `MetadataDiffExpansion`'s public API identical (same props, same render output at the top-level FileRow), only move the inner `TwoPaneView` to a new file. Add a snapshot/render regression test on `MetadataDiffExpansion` to catch any visual delta.
|
||||
- **Deeply nested archives render-loop risk.** `ZipExpansion` recursively rendering itself for nested `.zip` entries could pathologically stack if a malicious archive declares itself as containing itself (zip quine). The recursion depth is bounded by JS stack, but render churn could hang the tab. Mitigation: cap the indent at depth 5 visually and add a render-time depth counter that surfaces a `[depth limit reached — drop the inner file directly]` row at depth 20. Strategy-side recursion is unchanged.
|
||||
- **Mobile / APK touch UX on the tree.** Inline chevrons at multiple indent levels can be hard to hit. The existing `FileRow` chevron is already there; new `zip-expansion__row__chevron` follows the same hit-target size (44×44 minimum per the existing touch UX track #49). Verify on the APK target before merging.
|
||||
|
||||
## 11. Out of scope (deferred or declined)
|
||||
|
||||
- Streaming / chunked ZIP processing for archives larger than RAM (would intersect with #34).
|
||||
- A structured warning type with i18n codes + params (refactor; not blocking).
|
||||
- Office retrofit to the new `archiveEntries` tree model (Office files keep their flat source-labelled view inside `MetadataDiffExpansion`; the tree is for ZIP's open-ended structure).
|
||||
- Virtualized rendering of inner-entry rows (pagination handles the 99% case).
|
||||
- ZIP-bomb decompression cap (explicitly declined by maintainer).
|
||||
- Decryption of encrypted entries (out of scope — no password prompts).
|
||||
- Repairs to self-extracting EXE stub bytes (out of scope — documented gap).
|
||||
- `.md` support (closed as out-of-scope; markdown has no embedded metadata).
|
||||
|
|
@ -20,6 +20,7 @@ const STANDALONE_HTML = resolve(
|
|||
);
|
||||
const SAMPLE_JPG = resolve(__dirname, "../tests/e2e/fixtures/sample.jpg");
|
||||
const SAMPLE_DOCX = resolve(__dirname, "../tests/e2e/fixtures/sample.docx");
|
||||
const SAMPLE_ZIP = resolve(__dirname, "../tests/e2e/fixtures/sample-zip.zip");
|
||||
|
||||
async function captureBreakpoint({
|
||||
label,
|
||||
|
|
@ -27,6 +28,7 @@ async function captureBreakpoint({
|
|||
fixture,
|
||||
emulate,
|
||||
screenshotHeight,
|
||||
expandZipLeaf,
|
||||
}: {
|
||||
label: string;
|
||||
viewport: { width: number; height: number };
|
||||
|
|
@ -35,6 +37,11 @@ async function captureBreakpoint({
|
|||
? never
|
||||
: typeof devices.iPhone14;
|
||||
screenshotHeight?: number;
|
||||
// When set, treat the dropped fixture as a ZIP archive and ALSO
|
||||
// expand the named inner-leaf row after the top-level expansion,
|
||||
// so the screenshot shows the per-leaf MetadataDiffTable inside
|
||||
// the ZipExpansion tree (per the spec's lazy-on-expand UI).
|
||||
expandZipLeaf?: string;
|
||||
}): Promise<void> {
|
||||
const browser = await chromium.launch();
|
||||
const context = await browser.newContext({
|
||||
|
|
@ -73,9 +80,31 @@ async function captureBreakpoint({
|
|||
await row.click();
|
||||
}
|
||||
|
||||
await page
|
||||
.locator(".file-table__diff--two-pane")
|
||||
.waitFor({ state: "visible", timeout: 10_000 });
|
||||
if (expandZipLeaf !== undefined) {
|
||||
// ZIP archives render a tree instead of the two-pane diff. Wait
|
||||
// for the tree, then click the named inner row to lazy-load its
|
||||
// per-leaf MetadataDiffTable.
|
||||
await page
|
||||
.locator(".zip-expansion")
|
||||
.waitFor({ state: "visible", timeout: 10_000 });
|
||||
const leafRow = page
|
||||
.locator(".zip-expansion__row--cleaned", { hasText: expandZipLeaf })
|
||||
.first();
|
||||
await leafRow.waitFor({ state: "visible", timeout: 10_000 });
|
||||
if (emulate !== undefined) {
|
||||
await leafRow.tap();
|
||||
} else {
|
||||
await leafRow.click();
|
||||
}
|
||||
await page
|
||||
.locator(".zip-expansion__leaf-diff")
|
||||
.first()
|
||||
.waitFor({ state: "visible", timeout: 30_000 });
|
||||
} else {
|
||||
await page
|
||||
.locator(".file-table__diff--two-pane")
|
||||
.waitFor({ state: "visible", timeout: 10_000 });
|
||||
}
|
||||
|
||||
// Small pause for animations + toast to settle.
|
||||
await page.waitForTimeout(3500);
|
||||
|
|
@ -125,6 +154,15 @@ async function main(): Promise<void> {
|
|||
emulate: devices["iPhone 14"],
|
||||
});
|
||||
|
||||
// ZIP: ZipExpansion tree + per-leaf diff for the inner JPEG. Two
|
||||
// captures — wider tree for the desktop view; narrower for mobile.
|
||||
await captureBreakpoint({
|
||||
label: "desktop-zip",
|
||||
viewport: { width: 1280, height: 1100 },
|
||||
fixture: SAMPLE_ZIP,
|
||||
expandZipLeaf: "photo.jpg",
|
||||
});
|
||||
|
||||
console.log("done.");
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -1,6 +1,7 @@
|
|||
// Application layer barrel file — re-exports commands, queries, ports, and use cases.
|
||||
|
||||
export type {
|
||||
LeafDiffResult,
|
||||
MetadataProcessorPort,
|
||||
ProcessOutcome,
|
||||
} from "./ports/metadata_processor_port";
|
||||
|
|
|
|||
|
|
@ -1,10 +1,29 @@
|
|||
import type { Result } from "../../common";
|
||||
import type { ExifError, MetadataDocument, StripOptions } from "../../domain";
|
||||
import type {
|
||||
ArchiveEntryResult,
|
||||
ExifError,
|
||||
MetadataDocument,
|
||||
StripOptions,
|
||||
} from "../../domain";
|
||||
|
||||
// Discriminated result of a leaf-diff build. The "failed" variant lets the
|
||||
// UI render a "Diff failed" message instead of conflating an internal
|
||||
// error with a successfully-empty diff (which should render "Already
|
||||
// clean").
|
||||
export type LeafDiffResult =
|
||||
| { readonly kind: "ok"; readonly doc: MetadataDocument }
|
||||
| { readonly kind: "failed" };
|
||||
|
||||
export interface ProcessOutcome {
|
||||
readonly outputPath: string;
|
||||
readonly outputBytes: number;
|
||||
readonly diffDocument: MetadataDocument | null;
|
||||
// Strategy-emitted non-fatal warnings (currently only ZipStrategy).
|
||||
readonly warnings?: readonly string[];
|
||||
// Recursive tree of inner archive entries (currently only ZipStrategy
|
||||
// populates). The per-leaf diffs are built lazily on-demand via
|
||||
// `buildArchiveLeafDiff`.
|
||||
readonly archiveEntries?: readonly ArchiveEntryResult[];
|
||||
}
|
||||
|
||||
export interface MetadataProcessorPort {
|
||||
|
|
@ -26,4 +45,23 @@ export interface MetadataProcessorPort {
|
|||
buildDiffDocumentForEntry(args: {
|
||||
entryId: string;
|
||||
}): Promise<MetadataDocument | null>;
|
||||
|
||||
// Out-of-band ExifTool read for a single leaf inside a ZIP archive.
|
||||
// Source + stripped bytes per leaf are stashed during `process()` when
|
||||
// the strategy returns `archiveEntries`. UI calls this lazily on first
|
||||
// expand of a leaf row in `ZipExpansion`. Caches the result so re-opens
|
||||
// (including after the parent ZIP collapses and unmounts ZipExpansion)
|
||||
// return the same answer instead of falling off the drained stash.
|
||||
// Returns `{kind: "failed"}` for "couldn't build diff" (read error or
|
||||
// cache miss); the UI renders "Diff failed" — distinct from the
|
||||
// successful-but-empty case rendered as "Already clean".
|
||||
buildArchiveLeafDiff(args: {
|
||||
entryId: string;
|
||||
path: string;
|
||||
}): Promise<LeafDiffResult>;
|
||||
|
||||
// Evicts cached leaf diffs (and pending bytes) for a given entryId so
|
||||
// parsed metadata doesn't linger after the user removes the file from
|
||||
// the app state. See privacy invariant §3.
|
||||
clearLeafCacheForEntry(args: { entryId: string }): void;
|
||||
}
|
||||
|
|
|
|||
53
src/domain/files/archive_entry.ts
Normal file
53
src/domain/files/archive_entry.ts
Normal file
|
|
@ -0,0 +1,53 @@
|
|||
// Pure value types describing one entry inside an archive container
|
||||
// (currently only produced by ZipStrategy). The recursive tree is
|
||||
// rendered by ZipExpansion.tsx; per-leaf diffs are built lazily by
|
||||
// WasmProcessor.buildArchiveLeafDiff using the stashed source +
|
||||
// stripped bytes.
|
||||
//
|
||||
// Types live in the domain layer so the application port can reference
|
||||
// them without importing from infrastructure.
|
||||
|
||||
import type { MetadataEntry } from "../exif/metadata_document";
|
||||
|
||||
// Per spec §3, v1 refuses encrypted archives outright; no entry ever
|
||||
// reaches the per-entry walk with encrypted bytes. The
|
||||
// "passed-through-encrypted" status that earlier drafts of the spec
|
||||
// referenced is therefore not part of this union — adding it back is
|
||||
// a follow-up that should accompany a byte-level walker capable of
|
||||
// processing the archive without going through JSZip.
|
||||
// "cleaned" = strategy ran and the output bytes differ from the input bytes
|
||||
// in any way (length OR content). Covers explicit metadata
|
||||
// removal (shrinks) AND in-place normalisation (ZIP CDH
|
||||
// timestamps, PDF re-encode, MP4 re-mux — same length,
|
||||
// different content).
|
||||
// "already-clean" = strategy ran and the output bytes are byte-identical to
|
||||
// the input bytes. The file had nothing to remove and the
|
||||
// strategy emitted its input verbatim (typical for JPEG/
|
||||
// PNG with no removable metadata). Still expandable so the
|
||||
// user can see the ExifTool diff confirming the file is
|
||||
// clean. Mirrors the top-level Complete vs NoMetadataFound
|
||||
// pattern.
|
||||
export type ArchiveEntryStatus =
|
||||
| "cleaned"
|
||||
| "already-clean"
|
||||
| "passed-through-unsupported"
|
||||
| "directory";
|
||||
|
||||
export interface ArchiveEntryResult {
|
||||
readonly path: string;
|
||||
readonly status: ArchiveEntryStatus;
|
||||
// For "cleaned" leaves: pre-strip and post-strip bytes for the
|
||||
// deferred per-leaf ExifTool diff. Consumed by
|
||||
// WasmProcessor.buildArchiveLeafDiff. Null for non-cleaned statuses
|
||||
// (no diff to build).
|
||||
readonly sourceBytes: Uint8Array | null;
|
||||
readonly strippedBytes: Uint8Array | null;
|
||||
// Walker entries from the inner strategy (Office docProps, PDF
|
||||
// annotations, etc.). Merged into the leaf's diff `before` set,
|
||||
// same as the top-level pattern in
|
||||
// WasmProcessor.buildDiffDocumentForEntry.
|
||||
readonly walkerEntries: readonly MetadataEntry[];
|
||||
// RECURSIVE — non-null when this entry is itself a ZIP; null otherwise.
|
||||
readonly entries: readonly ArchiveEntryResult[] | null;
|
||||
readonly warnings: readonly string[];
|
||||
}
|
||||
|
|
@ -40,6 +40,8 @@ export const SUPPORTED_EXTENSIONS: ReadonlySet<string> = new Set([
|
|||
".xlsx",
|
||||
".pptx",
|
||||
".odt",
|
||||
// Archives
|
||||
".zip",
|
||||
]);
|
||||
|
||||
interface IsSupportedFileParams {
|
||||
|
|
|
|||
|
|
@ -27,6 +27,10 @@ export { middleTruncatePath } from "./path_truncation";
|
|||
export type { ExifError } from "./exif/exif_errors";
|
||||
export { formatExifError } from "./exif/exif_errors";
|
||||
export type { MetadataEntry, MetadataDocument } from "./exif/metadata_document";
|
||||
export type {
|
||||
ArchiveEntryResult,
|
||||
ArchiveEntryStatus,
|
||||
} from "./files/archive_entry";
|
||||
export type { SettingsError } from "./settings_errors";
|
||||
export { formatSettingsError } from "./settings_errors";
|
||||
export type { FolderError } from "./files/folder_errors";
|
||||
|
|
|
|||
|
|
@ -1,5 +1,6 @@
|
|||
import type { Result } from "../../common";
|
||||
import type {
|
||||
ArchiveEntryResult,
|
||||
ExifError,
|
||||
MetadataDocument,
|
||||
MetadataEntry,
|
||||
|
|
@ -7,6 +8,7 @@ import type {
|
|||
} from "../../domain";
|
||||
|
||||
export type { StripOptions };
|
||||
export type { ArchiveEntryResult, ArchiveEntryStatus } from "../../domain";
|
||||
|
||||
export interface StripResult {
|
||||
readonly bytes: Uint8Array;
|
||||
|
|
@ -24,6 +26,21 @@ export interface StripResult {
|
|||
// WasmProcessor (not the strategy) after a successful strip, via
|
||||
// ExifToolDiffStrategy. Strategies themselves return null here.
|
||||
readonly diffDocument: MetadataDocument | null;
|
||||
// Non-fatal per-file warnings emitted by the strategy. Surfaced as
|
||||
// an inline disclosure on the FileRow. Reserved for future use:
|
||||
// the originally-planned encrypted-entry passthrough messages
|
||||
// would have populated this, but v1 refuses encrypted archives
|
||||
// outright (spec §3) so no current strategy emits warnings. Kept
|
||||
// optional + threaded through so adding the byte-level walker
|
||||
// follow-up doesn't require touching every reducer + the WebApi
|
||||
// surface again.
|
||||
readonly warnings?: readonly string[];
|
||||
// Recursive tree of inner entries for archive formats. Currently only
|
||||
// ZipStrategy populates this. UI: src/web/components/file-list/ZipExpansion.tsx
|
||||
// renders the tree; per-leaf diffs are lazy-loaded via
|
||||
// WasmProcessor.buildArchiveLeafDiff. See
|
||||
// docs/superpowers/specs/2026-05-22-issue-184-zip-support-design.md.
|
||||
readonly archiveEntries?: readonly ArchiveEntryResult[];
|
||||
}
|
||||
|
||||
export interface FormatStrategy {
|
||||
|
|
|
|||
605
src/infrastructure/wasm/strategies/zip_strategy.ts
Normal file
605
src/infrastructure/wasm/strategies/zip_strategy.ts
Normal file
|
|
@ -0,0 +1,605 @@
|
|||
import JSZip from "jszip";
|
||||
import type { Result } from "../../../common";
|
||||
import type { ExifError, MetadataEntry } from "../../../domain";
|
||||
import type {
|
||||
ArchiveEntryResult,
|
||||
ArchiveEntryStatus,
|
||||
FormatStrategy,
|
||||
StripOptions,
|
||||
StripResult,
|
||||
} from "../format_strategy";
|
||||
|
||||
// The inner-entry router is injected by strategy_registry.ts after both
|
||||
// modules have evaluated, via `setZipStrategyRouter(selectStrategy)`.
|
||||
// Avoids the static circular import (registry → ZipStrategy + ZipStrategy
|
||||
// → registry) that would otherwise observe an uninitialized export
|
||||
// binding. The slot is set during module-init side effects of the
|
||||
// registry, so any consumer that imports strategy_registry (the
|
||||
// production path; test files do this too) has it ready by call time.
|
||||
type InnerRouter = (args: {
|
||||
filename: string;
|
||||
bytes: Uint8Array;
|
||||
}) => FormatStrategy | null;
|
||||
|
||||
let injectedRouter: InnerRouter | null = null;
|
||||
// Module-level flag so the dev/test "router not injected" warning fires
|
||||
// at most once per session even if many strip() calls happen — avoids
|
||||
// drowning test output when a fixture battery runs through a directly-
|
||||
// constructed ZipStrategy. The first call surfaces the issue; subsequent
|
||||
// calls stay silent.
|
||||
let routerMissingWarned = false;
|
||||
|
||||
export function setZipStrategyRouter(router: InnerRouter): void {
|
||||
injectedRouter = router;
|
||||
}
|
||||
|
||||
// Privacy-invariant §6 canonical output values, applied uniformly to
|
||||
// every entry regardless of what the source archive carried:
|
||||
//
|
||||
// ZIP_EPOCH — DOS-time minimum for all timestamps.
|
||||
// Constructed in UTC at noon so JSZip's getUTC*
|
||||
// accessors land on 1980-01-01 in every timezone.
|
||||
// Local-time construction (e.g. new Date(1980, 0, 1))
|
||||
// wraps under negative UTC offsets where the UTC year
|
||||
// is 1979 → DOS year underflow → year 2108 on
|
||||
// read-back (confirmed via JSZip round-trip POC,
|
||||
// May 2026).
|
||||
//
|
||||
// unixPermissions — Canonical 0o644 (files) / 0o755 (dirs). Unix mode
|
||||
// bits in the ZIP central directory leak the
|
||||
// producer's umask; we normalise them like timestamps.
|
||||
//
|
||||
// dosPermissions — 0 (normal file). DOS attribute flags can carry
|
||||
// archive/hidden/system bits that also identify the
|
||||
// producing tool/OS; zeroed in the output.
|
||||
//
|
||||
// compression — DEFLATE via generateAsync. Pinned explicitly so the
|
||||
// output codec is a stated choice rather than an
|
||||
// accident of the JSZip default (STORE), which would
|
||||
// expand any entry that was DEFLATE-compressed in the
|
||||
// source. Uniform codec also avoids fingerprinting by
|
||||
// per-entry compression-method variance.
|
||||
const ZIP_EPOCH = new Date(Date.UTC(1980, 0, 1, 12, 0, 0));
|
||||
const UNIX_PERMS_FILE = 0o644;
|
||||
const UNIX_PERMS_DIR = 0o755;
|
||||
const DOS_PERMS_NORMAL = 0;
|
||||
|
||||
// Hard cap on ZIP nesting depth. A chain of ZIPs-in-ZIPs recurses
|
||||
// without bound — each level fully decompresses + re-zips into memory,
|
||||
// and classic ZIP-bomb ratios make this a real DoS surface. 10 levels
|
||||
// covers every legitimate real-world nesting pattern we've seen.
|
||||
const MAX_NESTING_DEPTH = 10;
|
||||
|
||||
// Hard cap on total decompressed bytes across ALL recursion levels in a
|
||||
// single top-level strip. The counter is threaded through every nested
|
||||
// stripAtDepth call via the shared `byteBudget` object so an adversarial
|
||||
// 10-level-nested archive can't allocate 10 × cap by resetting the
|
||||
// counter at each level (a real bug in the previous local-counter
|
||||
// implementation).
|
||||
const MAX_DECOMPRESSED_BYTES = 2 * 1024 * 1024 * 1024;
|
||||
|
||||
export class ZipStrategy implements FormatStrategy {
|
||||
readonly extensions: ReadonlySet<string> = new Set([".zip"]);
|
||||
|
||||
verifyMagicBytes({ bytes }: { bytes: Uint8Array }): boolean {
|
||||
if (bytes.length < 4) return false;
|
||||
if (bytes[0] !== 0x50 || bytes[1] !== 0x4b) return false;
|
||||
// PK\x03\x04 = local file header (any archive with entries);
|
||||
// PK\x05\x06 = end of central directory (empty archive only).
|
||||
return (
|
||||
(bytes[2] === 0x03 && bytes[3] === 0x04) ||
|
||||
(bytes[2] === 0x05 && bytes[3] === 0x06)
|
||||
);
|
||||
}
|
||||
|
||||
async strip({
|
||||
bytes,
|
||||
options,
|
||||
}: {
|
||||
bytes: Uint8Array;
|
||||
options: StripOptions;
|
||||
}): Promise<Result<StripResult, ExifError>> {
|
||||
// Fresh budget per top-level strip; threaded through recursive calls
|
||||
// so cumulative decompression across nested levels is bounded.
|
||||
return this.stripAtDepth({
|
||||
bytes,
|
||||
options,
|
||||
depth: 0,
|
||||
byteBudget: { used: 0 },
|
||||
});
|
||||
}
|
||||
|
||||
private async stripAtDepth({
|
||||
bytes,
|
||||
options,
|
||||
depth,
|
||||
byteBudget,
|
||||
}: {
|
||||
bytes: Uint8Array;
|
||||
options: StripOptions;
|
||||
depth: number;
|
||||
// Shared mutable counter across the entire recursive call graph for
|
||||
// one top-level strip. See MAX_DECOMPRESSED_BYTES comment.
|
||||
byteBudget: { used: number };
|
||||
}): Promise<Result<StripResult, ExifError>> {
|
||||
if (!this.verifyMagicBytes({ bytes })) {
|
||||
return {
|
||||
ok: false,
|
||||
error: {
|
||||
code: "invalid-file-format",
|
||||
detail: "Not a ZIP archive (magic bytes don't match)",
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
if (depth >= MAX_NESTING_DEPTH) {
|
||||
return {
|
||||
ok: false,
|
||||
error: {
|
||||
code: "invalid-file-format",
|
||||
detail: `ZIP nesting limit (${MAX_NESTING_DEPTH}) exceeded — possible ZIP bomb or adversarially-nested archive`,
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
if (injectedRouter === null && !routerMissingWarned) {
|
||||
// Dev/test invariant — production always imports strategy_registry.ts
|
||||
// which calls setZipStrategyRouter(). Warn once per session so test
|
||||
// mis-setup surfaces faster than "inner entries silently pass through
|
||||
// uncleaned" without spamming N copies of the message.
|
||||
routerMissingWarned = true;
|
||||
console.warn(
|
||||
"[ZipStrategy] inner router not injected — inner entries will not be " +
|
||||
"cleaned. Import strategy_registry.ts or call setZipStrategyRouter() " +
|
||||
"before use.",
|
||||
);
|
||||
}
|
||||
const selectStrategy: InnerRouter = injectedRouter ?? (() => null);
|
||||
|
||||
// Detect encrypted entries by scanning the central directory ourselves
|
||||
// (bit 0 = ZipCrypto, bit 6 = strong/AES, method 99 = WinZip AES).
|
||||
// JSZip's loadAsync throws on bit 0 only; bit 6 and method 99 entries
|
||||
// would silently pass through and emit garbled output. CDH scanning
|
||||
// avoids the LFH-scanner blind spots (ZIP64 size-overflow markers,
|
||||
// streaming-data-descriptor entries with size=0) because CDH records
|
||||
// always carry the real values for these fields. ZIP64 archives with
|
||||
// EOCD-level overflow markers are an explicit gap — we defer to
|
||||
// JSZip in that branch (bit 0 still caught; bit 6 documented as a
|
||||
// rare gap for ZIP64).
|
||||
if (archiveHasEncryptedEntries(bytes)) {
|
||||
return {
|
||||
ok: false,
|
||||
error: {
|
||||
code: "invalid-file-format",
|
||||
detail:
|
||||
"Encrypted ZIP archives aren't supported — use a dedicated tool (7-Zip, ExifTool standalone) that can decrypt to clean inner content.",
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
let zip: JSZip;
|
||||
try {
|
||||
zip = await JSZip.loadAsync(bytes);
|
||||
} catch (err: unknown) {
|
||||
const msg = err instanceof Error ? err.message : String(err);
|
||||
// Pinned to JSZip's exact throw string — fragile to a JSZip wording
|
||||
// change, but bounded to the bit-0 fallback path that the CDH
|
||||
// scanner above would have caught anyway. If JSZip's wording ever
|
||||
// changes we'll surface parse-failed, not lose privacy.
|
||||
if (msg === "Encrypted zip are not supported") {
|
||||
return {
|
||||
ok: false,
|
||||
error: {
|
||||
code: "invalid-file-format",
|
||||
detail:
|
||||
"Encrypted ZIP archives aren't supported — use a dedicated tool (7-Zip, ExifTool standalone) that can decrypt to clean inner content.",
|
||||
},
|
||||
};
|
||||
}
|
||||
return {
|
||||
ok: false,
|
||||
error: {
|
||||
code: "parse-failed",
|
||||
raw: msg,
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
const archiveEntries: ArchiveEntryResult[] = [];
|
||||
const warnings: string[] = [];
|
||||
|
||||
// Collect first — we rely on JSZip's loadAsync inserting entries in
|
||||
// central-directory order (a consequence of CD-parse order inside
|
||||
// loadAsync; not an explicitly documented guarantee, but a stable
|
||||
// implementation detail). The output's entry order mirrors the input's
|
||||
// as a result.
|
||||
const entries: Array<[string, JSZip.JSZipObject]> = [];
|
||||
zip.forEach((path, entry) => entries.push([path, entry]));
|
||||
|
||||
const outputZip = new JSZip();
|
||||
|
||||
// Pre-emit all parent-folder entries with canonical ZIP_EPOCH BEFORE
|
||||
// adding any file. Without this, JSZip's fileAdd auto-creates missing
|
||||
// parent directories via folderAdd, which falls back to
|
||||
// `o.date || new Date()` — leaking the current processing time into
|
||||
// the output central directory in violation of privacy invariant §6.
|
||||
// Info-ZIP and many archive tools omit explicit directory entries, so
|
||||
// this path fires on common real-world inputs, not just adversarial
|
||||
// ones.
|
||||
const allPaths = entries.map(([p]) => p);
|
||||
for (const folderPath of collectParentFolders(allPaths)) {
|
||||
outputZip.file(folderPath, "", {
|
||||
date: ZIP_EPOCH,
|
||||
comment: "",
|
||||
dir: true,
|
||||
unixPermissions: UNIX_PERMS_DIR,
|
||||
dosPermissions: DOS_PERMS_NORMAL,
|
||||
});
|
||||
}
|
||||
|
||||
for (const [path, entry] of entries) {
|
||||
if (entry.dir) {
|
||||
outputZip.file(path, "", {
|
||||
date: ZIP_EPOCH,
|
||||
comment: "",
|
||||
dir: true,
|
||||
unixPermissions: UNIX_PERMS_DIR,
|
||||
dosPermissions: DOS_PERMS_NORMAL,
|
||||
});
|
||||
archiveEntries.push({
|
||||
path,
|
||||
status: "directory",
|
||||
sourceBytes: null,
|
||||
strippedBytes: null,
|
||||
walkerEntries: [],
|
||||
entries: null,
|
||||
warnings: [],
|
||||
});
|
||||
continue;
|
||||
}
|
||||
|
||||
// Pre-check declared uncompressed size from JSZip's parsed CDH
|
||||
// BEFORE allocating the decompressed Uint8Array. Without this, a
|
||||
// single small-compressed huge-decompressed entry (classic ZIP-bomb
|
||||
// shape) would OOM the tab during entry.async() before the budget
|
||||
// check below ever runs. The post-check below remains as defence
|
||||
// in case the CDH lied about the size.
|
||||
const reportedSize = getReportedUncompressedSize(entry);
|
||||
if (
|
||||
reportedSize !== null &&
|
||||
byteBudget.used + reportedSize > MAX_DECOMPRESSED_BYTES
|
||||
) {
|
||||
return budgetExceededError();
|
||||
}
|
||||
|
||||
const inputEntryBytes = await entry.async("uint8array");
|
||||
|
||||
byteBudget.used += inputEntryBytes.byteLength;
|
||||
if (byteBudget.used > MAX_DECOMPRESSED_BYTES) {
|
||||
return budgetExceededError();
|
||||
}
|
||||
|
||||
const innerStrategy = selectStrategy({
|
||||
filename: path,
|
||||
bytes: inputEntryBytes,
|
||||
});
|
||||
|
||||
let outputEntryBytes = inputEntryBytes;
|
||||
let status: ArchiveEntryStatus = "passed-through-unsupported";
|
||||
let innerWalkerEntries: readonly MetadataEntry[] = [];
|
||||
let innerArchiveEntries: readonly ArchiveEntryResult[] | null = null;
|
||||
const entryWarnings: string[] = [];
|
||||
|
||||
if (innerStrategy !== null) {
|
||||
if (innerStrategy instanceof ZipStrategy) {
|
||||
const inner = await innerStrategy.stripAtDepth({
|
||||
bytes: inputEntryBytes,
|
||||
options,
|
||||
depth: depth + 1,
|
||||
byteBudget,
|
||||
});
|
||||
if (inner.ok) {
|
||||
outputEntryBytes = inner.value.bytes;
|
||||
innerArchiveEntries = inner.value.archiveEntries ?? null;
|
||||
// Nested-ZIP status: only "cleaned" if a deep entry actually
|
||||
// had metadata removed. We deliberately DON'T use byte-shrink
|
||||
// here because the inner ZIP gets re-encoded with our
|
||||
// canonical perms/timestamps/compression, which can shrink
|
||||
// bytes even when no inner-inner entry had any metadata
|
||||
// removed (false-positive 'cleaned' on already-clean nested
|
||||
// archives with non-canonical permissions).
|
||||
status =
|
||||
innerArchiveEntries !== null &&
|
||||
hasAnyCleanedInTree(innerArchiveEntries)
|
||||
? "cleaned"
|
||||
: "already-clean";
|
||||
for (const w of inner.value.warnings ?? []) {
|
||||
const prefixed = `${path}: ${w}`;
|
||||
warnings.push(prefixed);
|
||||
entryWarnings.push(prefixed);
|
||||
}
|
||||
} else {
|
||||
// Surface the refusal reason (encryption, depth limit, etc.)
|
||||
// as a warning so the user knows this nested ZIP was left
|
||||
// untouched rather than silently passing through. Asymmetric
|
||||
// with the outer level where the caller sees the error
|
||||
// directly; for inner ZIPs, warning-and-continue lets the
|
||||
// rest of the outer archive still be processed.
|
||||
const detail = innerErrorDetail(inner.error);
|
||||
const w = `${path}: nested ZIP left untouched — ${detail}`;
|
||||
warnings.push(w);
|
||||
entryWarnings.push(w);
|
||||
}
|
||||
} else {
|
||||
const inner = await innerStrategy.strip({
|
||||
bytes: inputEntryBytes,
|
||||
options,
|
||||
});
|
||||
if (inner.ok) {
|
||||
outputEntryBytes = inner.value.bytes;
|
||||
innerWalkerEntries = inner.value.walkerEntries;
|
||||
// "cleaned" iff the strategy's output bytes differ from the
|
||||
// input bytes in any way (length OR content), OR walker
|
||||
// entries were emitted. Length-only comparison was too
|
||||
// strict: Office, PDF, and ffmpeg-MP4 strategies routinely
|
||||
// produce same-length output with different content (ZIP
|
||||
// CDH timestamps normalized, PDF re-encoded, MP4 re-muxed
|
||||
// with stripped mvhd boxes) — those changes ARE the
|
||||
// cleaning and the diff confirms them, but the byte-length
|
||||
// check rendered them as "Already clean" misleadingly.
|
||||
// Byte-by-byte comparison correctly identifies these as
|
||||
// "cleaned" while still showing the no-op pass-through
|
||||
// case (JPEG/PNG with no removable metadata, where the
|
||||
// strategy emits its input verbatim) as "already-clean".
|
||||
status =
|
||||
innerWalkerEntries.length > 0 ||
|
||||
!bytesAreIdentical(outputEntryBytes, inputEntryBytes)
|
||||
? "cleaned"
|
||||
: "already-clean";
|
||||
for (const w of inner.value.warnings ?? []) {
|
||||
const prefixed = `${path}: ${w}`;
|
||||
warnings.push(prefixed);
|
||||
entryWarnings.push(prefixed);
|
||||
}
|
||||
}
|
||||
// inner.ok === false → magic-byte mismatch on a misnamed file;
|
||||
// left as passed-through-unsupported with no warning because a
|
||||
// misnamed file is an input-data fact, not a privacy event.
|
||||
}
|
||||
}
|
||||
|
||||
outputZip.file(path, outputEntryBytes, {
|
||||
date: ZIP_EPOCH,
|
||||
comment: "",
|
||||
unixPermissions: UNIX_PERMS_FILE,
|
||||
dosPermissions: DOS_PERMS_NORMAL,
|
||||
});
|
||||
|
||||
// Only stash bytes for actual LEAVES that the user can expand to
|
||||
// see a diff. Nested-ZIP entries (innerArchiveEntries !== null)
|
||||
// have their own leaves stashed during the recursive call, and the
|
||||
// UI never builds a top-level diff for the nested ZIP itself —
|
||||
// holding the full re-encoded bytes on the parent entry would be
|
||||
// pure memory bloat that lives in AppContext for the FileEntry's
|
||||
// lifetime.
|
||||
const isExpandableLeaf =
|
||||
(status === "cleaned" || status === "already-clean") &&
|
||||
innerArchiveEntries === null;
|
||||
archiveEntries.push({
|
||||
path,
|
||||
status,
|
||||
sourceBytes: isExpandableLeaf ? inputEntryBytes : null,
|
||||
strippedBytes: isExpandableLeaf ? outputEntryBytes : null,
|
||||
walkerEntries: innerWalkerEntries,
|
||||
entries: innerArchiveEntries,
|
||||
warnings: entryWarnings,
|
||||
});
|
||||
}
|
||||
|
||||
let outputBytes: Uint8Array;
|
||||
try {
|
||||
outputBytes = await outputZip.generateAsync({
|
||||
type: "uint8array",
|
||||
comment: "",
|
||||
// DEFLATE: see ZIP_EPOCH comment block above.
|
||||
compression: "DEFLATE",
|
||||
});
|
||||
} catch (err: unknown) {
|
||||
return {
|
||||
ok: false,
|
||||
error: {
|
||||
code: "file-io-error",
|
||||
detail: err instanceof Error ? err.message : String(err),
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
return {
|
||||
ok: true,
|
||||
value: {
|
||||
bytes: outputBytes,
|
||||
walkerEntries: [],
|
||||
diffDocument: null,
|
||||
warnings,
|
||||
archiveEntries,
|
||||
},
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
// Byte-by-byte equality check on two Uint8Arrays. Used to decide whether a
|
||||
// strategy actually changed the entry's content (timestamps normalized, EXIF
|
||||
// stripped, re-muxed) vs. truly passed through verbatim. Compares 4 bytes at
|
||||
// a time via DataView for speed on larger entries.
|
||||
function bytesAreIdentical(a: Uint8Array, b: Uint8Array): boolean {
|
||||
if (a.byteLength !== b.byteLength) return false;
|
||||
const len = a.byteLength;
|
||||
const va = new DataView(a.buffer, a.byteOffset, len);
|
||||
const vb = new DataView(b.buffer, b.byteOffset, len);
|
||||
const chunks = len >>> 2;
|
||||
for (let i = 0; i < chunks; i++) {
|
||||
const offset = i << 2;
|
||||
if (va.getUint32(offset) !== vb.getUint32(offset)) return false;
|
||||
}
|
||||
for (let i = chunks << 2; i < len; i++) {
|
||||
if (a[i] !== b[i]) return false;
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
// Recursively walks an archiveEntries tree and returns true if any entry has
|
||||
// status "cleaned". Used to promote a nested ZIP's own status from
|
||||
// "already-clean" to "cleaned" when at least one deep entry actually had
|
||||
// metadata removed — the outer ZIP shouldn't show "Already clean" just
|
||||
// because its byte size didn't shrink overall.
|
||||
function hasAnyCleanedInTree(entries: readonly ArchiveEntryResult[]): boolean {
|
||||
for (const entry of entries) {
|
||||
if (entry.status === "cleaned") return true;
|
||||
if (entry.entries !== null && hasAnyCleanedInTree(entry.entries))
|
||||
return true;
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
// Extracts a human-readable detail string from an ExifError for use in
|
||||
// warning messages when an inner ZIP strip is refused.
|
||||
function innerErrorDetail(error: ExifError): string {
|
||||
if (
|
||||
error.code === "invalid-file-format" ||
|
||||
error.code === "file-io-error" ||
|
||||
error.code === "exiftool-error"
|
||||
) {
|
||||
return error.detail;
|
||||
}
|
||||
if (error.code === "parse-failed") {
|
||||
return error.raw;
|
||||
}
|
||||
return error.code;
|
||||
}
|
||||
|
||||
// Walks every entry path and returns the set of unique parent-folder paths
|
||||
// (with trailing slash). Used to pre-emit those folders into outputZip with
|
||||
// canonical ZIP_EPOCH timestamps, so JSZip's fileAdd never auto-creates a
|
||||
// parent with `new Date()` — privacy invariant §6 violation otherwise.
|
||||
function collectParentFolders(
|
||||
entryPaths: readonly string[],
|
||||
): readonly string[] {
|
||||
const parents = new Set<string>();
|
||||
for (const path of entryPaths) {
|
||||
const segments = path.split("/");
|
||||
for (let i = 1; i < segments.length; i++) {
|
||||
// Skip empty prefixes (paths starting with "/")
|
||||
if (segments[i - 1] === "") continue;
|
||||
const prefix = segments.slice(0, i).join("/") + "/";
|
||||
parents.add(prefix);
|
||||
}
|
||||
}
|
||||
return [...parents].sort();
|
||||
}
|
||||
|
||||
// JSZipObject's parsed-CDH uncompressed size lives on the internal _data
|
||||
// CompressedObject. Reading it via a typed accessor lets us pre-check
|
||||
// against the byte budget BEFORE allocating the decompressed buffer.
|
||||
// Returns null when the field isn't present (older JSZip versions,
|
||||
// mocked instances) so the caller falls back to the post-allocation
|
||||
// check.
|
||||
function getReportedUncompressedSize(entry: JSZip.JSZipObject): number | null {
|
||||
const data = (entry as unknown as { _data?: unknown })._data;
|
||||
if (
|
||||
data !== null &&
|
||||
typeof data === "object" &&
|
||||
data !== undefined &&
|
||||
"uncompressedSize" in data
|
||||
) {
|
||||
const size = (data as { uncompressedSize: unknown }).uncompressedSize;
|
||||
if (typeof size === "number" && Number.isFinite(size) && size >= 0) {
|
||||
return size;
|
||||
}
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
function budgetExceededError(): Result<StripResult, ExifError> {
|
||||
return {
|
||||
ok: false,
|
||||
error: {
|
||||
code: "file-io-error",
|
||||
detail: `ZIP archive expands to more than ${MAX_DECOMPRESSED_BYTES / 1024 ** 3} GB when decompressed`,
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
// Scans the central directory for entries flagged as encrypted (general-
|
||||
// purpose-flag bit 0 = ZipCrypto, bit 6 = strong/AES) or using compression
|
||||
// method 99 (WinZip AE-1/AE-2 AES). Walks CDH records, NOT LFH records —
|
||||
// CDH carries the real values for these fields even when the entry uses
|
||||
// streaming data descriptors or ZIP64 size overflow at the LFH level,
|
||||
// avoiding the two blind spots that made the earlier LFH-based scanner
|
||||
// unreliable.
|
||||
//
|
||||
// ZIP64 caveat: if the standard EOCD record uses overflow markers
|
||||
// (0xFFFFFFFF / 0xFFFF) for CD location, we fall back to false and rely
|
||||
// on JSZip's loadAsync to throw on bit 0. Documented gap: bit-6 / method-99
|
||||
// encryption inside ZIP64 archives is not detected here.
|
||||
function archiveHasEncryptedEntries(bytes: Uint8Array): boolean {
|
||||
if (bytes.length < 22) return false;
|
||||
|
||||
// Find EOCD signature 0x06054b50 (PK\x05\x06) by scanning backwards.
|
||||
// ZIP archive comment max length is 65535, so EOCD is within the last
|
||||
// 65557 bytes (22-byte EOCD + comment).
|
||||
let eocdOffset = -1;
|
||||
const minStart = Math.max(0, bytes.length - 65557);
|
||||
for (let i = bytes.length - 22; i >= minStart; i--) {
|
||||
if (
|
||||
bytes[i] === 0x50 &&
|
||||
bytes[i + 1] === 0x4b &&
|
||||
bytes[i + 2] === 0x05 &&
|
||||
bytes[i + 3] === 0x06
|
||||
) {
|
||||
eocdOffset = i;
|
||||
break;
|
||||
}
|
||||
}
|
||||
if (eocdOffset < 0) return false;
|
||||
|
||||
const dv = new DataView(bytes.buffer, bytes.byteOffset, bytes.byteLength);
|
||||
const totalEntries = dv.getUint16(eocdOffset + 10, true);
|
||||
const cdSize = dv.getUint32(eocdOffset + 12, true);
|
||||
const cdOffset = dv.getUint32(eocdOffset + 16, true);
|
||||
|
||||
// ZIP64 overflow markers — fall back to JSZip's bit-0 check.
|
||||
if (
|
||||
cdSize === 0xffffffff ||
|
||||
cdOffset === 0xffffffff ||
|
||||
totalEntries === 0xffff
|
||||
) {
|
||||
return false;
|
||||
}
|
||||
|
||||
let pos = cdOffset;
|
||||
let count = 0;
|
||||
while (count < totalEntries && pos + 46 <= bytes.length) {
|
||||
// CDH signature 0x02014b50 (PK\x01\x02)
|
||||
if (
|
||||
bytes[pos] !== 0x50 ||
|
||||
bytes[pos + 1] !== 0x4b ||
|
||||
bytes[pos + 2] !== 0x01 ||
|
||||
bytes[pos + 3] !== 0x02
|
||||
) {
|
||||
return false;
|
||||
}
|
||||
const gpFlag = dv.getUint16(pos + 8, true);
|
||||
const method = dv.getUint16(pos + 10, true);
|
||||
|
||||
if ((gpFlag & 0x0001) !== 0) return true; // ZipCrypto
|
||||
if ((gpFlag & 0x0040) !== 0) return true; // strong encryption
|
||||
if (method === 99) return true; // WinZip AE-1/AE-2 AES
|
||||
|
||||
const nameLen = dv.getUint16(pos + 28, true);
|
||||
const extraLen = dv.getUint16(pos + 30, true);
|
||||
const commentLen = dv.getUint16(pos + 32, true);
|
||||
pos += 46 + nameLen + extraLen + commentLen;
|
||||
count++;
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
|
@ -3,6 +3,7 @@ import { VideoStrategy } from "./strategies/video_strategy";
|
|||
import { JpegStrategy } from "./strategies/jpeg_strategy";
|
||||
import { PngStrategy } from "./strategies/png_strategy";
|
||||
import { PdfStrategy } from "./strategies/pdf_strategy";
|
||||
import { ZipStrategy, setZipStrategyRouter } from "./strategies/zip_strategy";
|
||||
import { ExifToolFallbackStrategy } from "./strategies/exiftool_fallback_strategy";
|
||||
import { FfmpegFallbackStrategy } from "./strategies/ffmpeg_fallback_strategy";
|
||||
import type { FormatStrategy } from "./format_strategy";
|
||||
|
|
@ -33,6 +34,7 @@ const STRATEGIES: readonly FormatStrategy[] = [
|
|||
new JpegStrategy(),
|
||||
new PngStrategy(),
|
||||
new PdfStrategy(),
|
||||
new ZipStrategy(),
|
||||
// Registered last so selectStrategy() only routes to it for extensions no
|
||||
// hand-rolled walker claims. Claims nothing in chunk A; format coverage
|
||||
// expands in follow-up PRs against #174.
|
||||
|
|
@ -72,3 +74,8 @@ export function allHandledExtensions(): ReadonlySet<string> {
|
|||
}
|
||||
return all;
|
||||
}
|
||||
|
||||
// Inject the router into ZipStrategy so it can dispatch inner entries
|
||||
// through the same registry. See the note at the top of
|
||||
// strategies/zip_strategy.ts for why this side effect exists.
|
||||
setZipStrategyRouter(selectStrategy);
|
||||
|
|
|
|||
|
|
@ -1,11 +1,13 @@
|
|||
import type {
|
||||
FileBytesPort,
|
||||
LeafDiffResult,
|
||||
MetadataProcessorPort,
|
||||
ProcessOutcome,
|
||||
} from "../../application";
|
||||
import type { Result } from "../../common";
|
||||
import { dispatchExifToolDiffLoading } from "../../common";
|
||||
import type {
|
||||
ArchiveEntryResult,
|
||||
ExifError,
|
||||
MetadataDocument,
|
||||
MetadataEntry,
|
||||
|
|
@ -51,10 +53,55 @@ interface PendingDiffInputs {
|
|||
readonly walkerEntries: readonly MetadataEntry[];
|
||||
}
|
||||
|
||||
// Pending per-leaf diff inputs for ZIP archives. Same shape as
|
||||
// PendingDiffInputs but keyed by `${entryId}\0${fullPath}` instead of
|
||||
// entryId alone — one entry per cleaned leaf in the archive tree.
|
||||
// Populated by stashArchiveLeaves() recursively after a ZipStrategy
|
||||
// strip resolves; drained by `buildArchiveLeafDiff` on first expand
|
||||
// of that leaf in ZipExpansion.
|
||||
//
|
||||
// `fullPath` is the leaf's path with each outer archive's path
|
||||
// prepended (separated by \0), so leaves inside nested ZIPs with
|
||||
// identical local names ("a.zip/photo.jpg" + "b.zip/photo.jpg")
|
||||
// don't collide. The NUL separator matches the pattern in
|
||||
// MetadataDiffTable.makeKey() — NUL is forbidden in every metadata
|
||||
// grammar we route, so the composite key can't collide with a
|
||||
// legitimate path that contains the separator.
|
||||
//
|
||||
// Unbounded for the batch lifetime — see same memory-bound note on
|
||||
// `pendingDiffs` above. The leak ceiling is "user's batch size *
|
||||
// number of cleaned leaves per archive"; documented as a risk in
|
||||
// docs/superpowers/specs/2026-05-22-issue-184-zip-support-design.md §10.
|
||||
type PendingLeafDiff = PendingDiffInputs;
|
||||
|
||||
function leafKey({
|
||||
entryId,
|
||||
fullPath,
|
||||
}: {
|
||||
entryId: string;
|
||||
fullPath: string;
|
||||
}): string {
|
||||
return `${entryId}\0${fullPath}`;
|
||||
}
|
||||
|
||||
export class WasmProcessor implements MetadataProcessorPort {
|
||||
private readonly fileBytes: FileBytesPort;
|
||||
private readonly diffStrategy: ExifToolDiffStrategy;
|
||||
private readonly pendingDiffs = new Map<string, PendingDiffInputs>();
|
||||
// Per-leaf diff inputs for ZIP archives, keyed by `${entryId}:${path}`.
|
||||
// Populated recursively by stashArchiveLeaves(); drained on first
|
||||
// expand of each leaf in ZipExpansion via buildArchiveLeafDiff.
|
||||
private readonly pendingLeafDiffs = new Map<string, PendingLeafDiff>();
|
||||
// Cache of in-flight or resolved leaf diff Promises, keyed identically to
|
||||
// pendingLeafDiffs. The value is a Promise wrapping a discriminated
|
||||
// `LeafDiffResult` so we can distinguish "diff returned nothing" (clean
|
||||
// file) from "diff failed" (ExifTool threw / parse error) — the prior
|
||||
// design cached a bare null for both, rendering a failure as "No
|
||||
// metadata" instead of "Diff failed". Caching the Promise (not the
|
||||
// resolved value) also makes concurrent buildArchiveLeafDiff calls for
|
||||
// the same key join the same await, eliminating the
|
||||
// drain-pending-before-cache-write race window.
|
||||
private readonly cachedLeafDocs = new Map<string, Promise<LeafDiffResult>>();
|
||||
|
||||
// Instance-level "have we fired the loading event yet?" flag. The
|
||||
// ExifToolDiffStrategy caches its ZeroPerl instance across calls, so
|
||||
|
|
@ -135,23 +182,43 @@ export class WasmProcessor implements MetadataProcessorPort {
|
|||
// the diff hasn't landed yet — the reducer flips a `diffPending`
|
||||
// flag to render the skeleton in MetadataDiffExpansion until the
|
||||
// async build dispatches `UPDATE_FILE_DIFF`.
|
||||
//
|
||||
// Archive containers (currently only ZipStrategy) take a different
|
||||
// path: the FileRow renders <ZipExpansion> instead of
|
||||
// <MetadataDiffExpansion>, so building a top-level diff for the
|
||||
// archive bytes themselves would be wasted ExifTool work. Skip
|
||||
// the per-entry stash and only walk archiveEntries for per-leaf
|
||||
// stashes.
|
||||
if (ENABLE_EXIFTOOL_DIFF) {
|
||||
this.pendingDiffs.set(entryId, {
|
||||
sourceBytes,
|
||||
strippedBytes: stripResult.value.bytes,
|
||||
extension: extname(filename),
|
||||
walkerEntries: stripResult.value.walkerEntries,
|
||||
});
|
||||
if (stripResult.value.archiveEntries !== undefined) {
|
||||
this.stashArchiveLeaves({
|
||||
entryId,
|
||||
entries: stripResult.value.archiveEntries,
|
||||
});
|
||||
} else {
|
||||
this.pendingDiffs.set(entryId, {
|
||||
sourceBytes,
|
||||
strippedBytes: stripResult.value.bytes,
|
||||
extension: extname(filename),
|
||||
walkerEntries: stripResult.value.walkerEntries,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
return {
|
||||
ok: true,
|
||||
value: {
|
||||
outputPath,
|
||||
outputBytes: stripResult.value.bytes.byteLength,
|
||||
diffDocument: null,
|
||||
},
|
||||
// exactOptionalPropertyTypes: omit optional fields entirely when
|
||||
// the source field is undefined rather than assigning undefined.
|
||||
const outcome: ProcessOutcome = {
|
||||
outputPath,
|
||||
outputBytes: stripResult.value.bytes.byteLength,
|
||||
diffDocument: null,
|
||||
...(stripResult.value.warnings !== undefined && {
|
||||
warnings: stripResult.value.warnings,
|
||||
}),
|
||||
...(stripResult.value.archiveEntries !== undefined && {
|
||||
archiveEntries: stripResult.value.archiveEntries,
|
||||
}),
|
||||
};
|
||||
return { ok: true, value: outcome };
|
||||
}
|
||||
|
||||
// Out-of-band ExifTool read on source + stripped bytes for the given
|
||||
|
|
@ -208,6 +275,10 @@ export class WasmProcessor implements MetadataProcessorPort {
|
|||
// `T()` boot blocked one of the two reads long enough for the other
|
||||
// to finish, but once perl was warm the race fired on every
|
||||
// subsequent file. Serial avoids the race entirely.
|
||||
// Top-level (non-archive) diff path: returns null on failure so the
|
||||
// existing top-level UI's "no diff" rendering stays unchanged. Inner
|
||||
// archive leaves use runLeafDiff below, which threads ok/failed
|
||||
// through to the cache so the UI can distinguish those two states.
|
||||
private async runDiff(
|
||||
pending: PendingDiffInputs,
|
||||
): Promise<MetadataDocument | null> {
|
||||
|
|
@ -233,6 +304,158 @@ export class WasmProcessor implements MetadataProcessorPort {
|
|||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
// Per-leaf diff with explicit ok/failed discrimination. Caller (the
|
||||
// cache) wraps this in a Promise; the Promise's resolved value tells
|
||||
// the UI which empty-state copy to render.
|
||||
private async runLeafDiff(
|
||||
pending: PendingDiffInputs,
|
||||
): Promise<LeafDiffResult> {
|
||||
try {
|
||||
const beforeResult = await this.diffStrategy.readDocument({
|
||||
bytes: pending.sourceBytes,
|
||||
extension: pending.extension,
|
||||
});
|
||||
const afterResult = await this.diffStrategy.readDocument({
|
||||
bytes: pending.strippedBytes,
|
||||
extension: pending.extension,
|
||||
});
|
||||
if (!beforeResult.ok || !afterResult.ok) {
|
||||
return { kind: "failed" };
|
||||
}
|
||||
const before = [...pending.walkerEntries, ...beforeResult.value];
|
||||
return { kind: "ok", doc: { before, after: afterResult.value } };
|
||||
} catch {
|
||||
return { kind: "failed" };
|
||||
}
|
||||
}
|
||||
|
||||
// Lazy per-leaf diff build for ZIP archives. Same shape +
|
||||
// drain-on-call semantics as buildDiffDocumentForEntry, but keyed
|
||||
// by NUL-separated `entryId\0fullPath` because one ZIP yields many
|
||||
// leaves and nested-ZIP siblings can share local entry names
|
||||
// (e.g. `a.zip/photo.jpg` + `b.zip/photo.jpg`). Called by
|
||||
// ZipExpansion on first expand of a cleaned-leaf row, with
|
||||
// `fullPath` composed from the outer-archive path prefix.
|
||||
//
|
||||
// Queues onto the same `diffChain` used by buildDiffDocumentForEntry
|
||||
// — `@uswriting/exiftool`'s parseMetadata uses module-level
|
||||
// singletons so any two readDocument calls (across entries OR
|
||||
// leaves) racing on the shared Perl/StringBuilder state corrupt the
|
||||
// readback. Per-leaf calls join the same chain so they serialize
|
||||
// against top-level diffs and against each other.
|
||||
async buildArchiveLeafDiff({
|
||||
entryId,
|
||||
path,
|
||||
}: {
|
||||
entryId: string;
|
||||
path: string;
|
||||
}): Promise<LeafDiffResult> {
|
||||
if (!ENABLE_EXIFTOOL_DIFF) {
|
||||
return { kind: "failed" };
|
||||
}
|
||||
const key = leafKey({ entryId, fullPath: path });
|
||||
|
||||
// Cache the in-flight Promise (not just the resolved value) so:
|
||||
// (1) Re-opens after the bytes stash is drained return the same
|
||||
// result — fixes the collapse-parent-ZIP + reopen-and-re-expand
|
||||
// case where the component-level cachedDoc is lost on unmount.
|
||||
// (2) Concurrent buildArchiveLeafDiff calls for the same key during
|
||||
// a rapid close-while-loading + reopen cycle JOIN the same
|
||||
// await instead of one draining pendingLeafDiffs and the other
|
||||
// seeing an empty cache and returning a null that overrides
|
||||
// the legitimate result.
|
||||
const cached = this.cachedLeafDocs.get(key);
|
||||
if (cached !== undefined) return cached;
|
||||
|
||||
const pending = this.pendingLeafDiffs.get(key);
|
||||
if (pending === undefined) return { kind: "failed" };
|
||||
this.pendingLeafDiffs.delete(key);
|
||||
|
||||
if (!this.diffWarmupSignalled) {
|
||||
this.diffWarmupSignalled = true;
|
||||
dispatchExifToolDiffLoading();
|
||||
}
|
||||
|
||||
// Queue onto the singleton diff chain for serialisation, then cache
|
||||
// the resulting Promise immediately so any concurrent caller sees it
|
||||
// and awaits the same instance.
|
||||
const computePromise: Promise<LeafDiffResult> = this.diffChain.then(() =>
|
||||
this.runLeafDiff(pending),
|
||||
);
|
||||
this.diffChain = computePromise.catch(() => null);
|
||||
this.cachedLeafDocs.set(key, computePromise);
|
||||
return computePromise;
|
||||
}
|
||||
|
||||
// Evicts all cached leaf state for a given entryId. Called by the web
|
||||
// API when a FileEntry is removed from the app state, so the parsed
|
||||
// metadata (potentially GPS, names, camera serials) doesn't linger on
|
||||
// the processor singleton for the rest of the tab session — per
|
||||
// privacy invariant §3 ("what we cannot claim to clean").
|
||||
clearLeafCacheForEntry({ entryId }: { entryId: string }): void {
|
||||
const prefix = `${entryId}\0`;
|
||||
for (const k of this.pendingLeafDiffs.keys()) {
|
||||
if (k.startsWith(prefix)) this.pendingLeafDiffs.delete(k);
|
||||
}
|
||||
for (const k of this.cachedLeafDocs.keys()) {
|
||||
if (k.startsWith(prefix)) this.cachedLeafDocs.delete(k);
|
||||
}
|
||||
this.pendingDiffs.delete(entryId);
|
||||
}
|
||||
|
||||
// Walk the archiveEntries tree and stash source + stripped bytes
|
||||
// for each cleaned leaf. Only ACTUAL LEAVES (entry.entries === null)
|
||||
// get stashed — nested-ZIP parent entries also carry sourceBytes
|
||||
// for their re-emitted archive bytes, but the UI's `isNestedZip`
|
||||
// branch never calls buildArchiveLeafDiff on them, so stashing
|
||||
// would leak the bytes forever. The nested ZIP's actual leaves
|
||||
// reach this function via the recursive call below.
|
||||
//
|
||||
// `outerPath` is the parent archive's full path (with trailing \0
|
||||
// separator) used to compose the leaf key. Empty at the top level;
|
||||
// "a.zip\0" inside the first nested-ZIP, "a.zip\0b.zip\0" two
|
||||
// levels deep, etc. Without the prefix, sibling nested zips with
|
||||
// identical local entry names would collide on the same key.
|
||||
private stashArchiveLeaves({
|
||||
entryId,
|
||||
entries,
|
||||
outerPath = "",
|
||||
}: {
|
||||
entryId: string;
|
||||
entries: readonly ArchiveEntryResult[];
|
||||
outerPath?: string;
|
||||
}): void {
|
||||
for (const entry of entries) {
|
||||
const fullPath = outerPath + entry.path;
|
||||
const isLeaf = entry.entries === null;
|
||||
// Stash both "cleaned" and "already-clean" leaves — the user can
|
||||
// still expand an "Already clean" row to see ExifTool's diff
|
||||
// confirming what's in the file (or that nothing changed).
|
||||
const isExpandable =
|
||||
entry.status === "cleaned" || entry.status === "already-clean";
|
||||
if (
|
||||
isLeaf &&
|
||||
isExpandable &&
|
||||
entry.sourceBytes !== null &&
|
||||
entry.strippedBytes !== null
|
||||
) {
|
||||
this.pendingLeafDiffs.set(leafKey({ entryId, fullPath }), {
|
||||
sourceBytes: entry.sourceBytes,
|
||||
strippedBytes: entry.strippedBytes,
|
||||
extension: extname(entry.path),
|
||||
walkerEntries: entry.walkerEntries,
|
||||
});
|
||||
}
|
||||
if (entry.entries !== null) {
|
||||
this.stashArchiveLeaves({
|
||||
entryId,
|
||||
entries: entry.entries,
|
||||
outerPath: `${fullPath}\0`,
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
function extname(filename: string): string {
|
||||
|
|
|
|||
|
|
@ -1,8 +1,10 @@
|
|||
import type {
|
||||
ArchiveEntryResult,
|
||||
MetadataDocument,
|
||||
Settings,
|
||||
I18nStringsDictionary,
|
||||
} from "../../domain";
|
||||
import type { LeafDiffResult } from "../../application";
|
||||
import {
|
||||
DEFAULT_SETTINGS,
|
||||
validateSettings,
|
||||
|
|
@ -87,6 +89,8 @@ export interface WasmApi {
|
|||
outputPath: string | null;
|
||||
outputBytes: number | null;
|
||||
diffDocument: MetadataDocument | null;
|
||||
warnings: readonly string[];
|
||||
archiveEntries: readonly ArchiveEntryResult[] | null;
|
||||
error: string | null;
|
||||
}>;
|
||||
|
||||
|
|
@ -97,6 +101,17 @@ export interface WasmApi {
|
|||
// the diff couldn't be built — caller falls back to the legacy delta
|
||||
// view (already MetadataDiffExpansion's default).
|
||||
buildDiffDocument(entryId: string): Promise<MetadataDocument | null>;
|
||||
|
||||
// Lazy per-leaf diff build for ZIP archives. Called by ZipExpansion
|
||||
// on first expand of a cleaned-leaf row. Returns a discriminated
|
||||
// LeafDiffResult so the UI can distinguish empty-but-successful
|
||||
// (render "Already clean") from failed (render "Diff failed").
|
||||
buildArchiveLeafDiff(entryId: string, path: string): Promise<LeafDiffResult>;
|
||||
|
||||
// Evicts cached leaf diffs + pending bytes for an entryId. Called from
|
||||
// AppContext when a FileEntry is removed so parsed metadata doesn't
|
||||
// linger on the processor singleton.
|
||||
clearLeafCacheForEntry(entryId: string): void;
|
||||
}
|
||||
|
||||
export interface WebApi {
|
||||
|
|
@ -287,6 +302,8 @@ export function makeWebApi(): WebApi {
|
|||
outputPath: null,
|
||||
outputBytes: null,
|
||||
diffDocument: null,
|
||||
warnings: [],
|
||||
archiveEntries: null,
|
||||
error: formatExifError(result.error),
|
||||
};
|
||||
}
|
||||
|
|
@ -295,11 +312,17 @@ export function makeWebApi(): WebApi {
|
|||
outputPath: result.value.outputPath,
|
||||
outputBytes: result.value.outputBytes,
|
||||
diffDocument: result.value.diffDocument,
|
||||
warnings: result.value.warnings ?? [],
|
||||
archiveEntries: result.value.archiveEntries ?? null,
|
||||
error: null,
|
||||
};
|
||||
},
|
||||
buildDiffDocument: async (entryId) =>
|
||||
processor.buildDiffDocumentForEntry({ entryId }),
|
||||
buildArchiveLeafDiff: async (entryId, path) =>
|
||||
processor.buildArchiveLeafDiff({ entryId, path }),
|
||||
clearLeafCacheForEntry: (entryId) =>
|
||||
processor.clearLeafCacheForEntry({ entryId }),
|
||||
},
|
||||
|
||||
folder: {
|
||||
|
|
|
|||
|
|
@ -35,6 +35,17 @@ function AppContent(): React.JSX.Element {
|
|||
|
||||
const hasFiles = state.files.length > 0;
|
||||
|
||||
// "Clean more" clears the file list. Also evict each FileEntry's cached
|
||||
// leaf diffs from the WasmProcessor so parsed metadata (GPS, names,
|
||||
// camera serials etc.) doesn't linger on the processor singleton for
|
||||
// the rest of the tab session. Privacy invariant §3.
|
||||
const handleClearFiles = useCallback((): void => {
|
||||
for (const file of state.files) {
|
||||
window.api.wasm.clearLeafCacheForEntry(file.id);
|
||||
}
|
||||
dispatch({ type: "CLEAR_FILES" });
|
||||
}, [state.files, dispatch]);
|
||||
|
||||
const { cleanedCount, errorCount, totalCount, totalTagsRemoved, allDone } =
|
||||
useFileStats(state.files);
|
||||
|
||||
|
|
@ -75,9 +86,7 @@ function AppContent(): React.JSX.Element {
|
|||
totalTagsRemoved={hasFiles ? totalTagsRemoved : undefined}
|
||||
elapsedSeconds={hasFiles ? elapsedSeconds : undefined}
|
||||
errorCount={hasFiles ? errorCount : undefined}
|
||||
onCleanMore={
|
||||
hasFiles ? () => dispatch({ type: "CLEAR_FILES" }) : undefined
|
||||
}
|
||||
onCleanMore={hasFiles ? handleClearFiles : undefined}
|
||||
/>
|
||||
<OfflineIndicator />
|
||||
<SettingsDrawer isOpen={isSettingsOpen} onClose={handleClose} />
|
||||
|
|
|
|||
|
|
@ -1,8 +1,14 @@
|
|||
// Single file row with 6 columns: STATUS, NAME, TYPE, BEFORE, AFTER, RESULT.
|
||||
// BEFORE = file.size, AFTER = file.afterBytes (post-strip size, null until
|
||||
// the strategy returns), RESULT renders the textual pill ('Cleaned' /
|
||||
// 'Already clean') via ResultPill. Supports row expansion for error
|
||||
// details and the "no metadata found" notice.
|
||||
// 'Already clean') via ResultPill. Supports four expansion modes:
|
||||
// - Error details (ErrorExpansion)
|
||||
// - "No metadata found" notice
|
||||
// - Archive entry tree (ZipExpansion) — when archiveEntries is populated
|
||||
// - Metadata diff (MetadataDiffExpansion) — otherwise, when a diff
|
||||
// exists or is pending
|
||||
// ZipExpansion and MetadataDiffExpansion are mutually exclusive — see the
|
||||
// hasArchiveEntries gate below.
|
||||
|
||||
import { useRef } from "react";
|
||||
import type { FileEntry } from "../../contexts/AppContext";
|
||||
|
|
@ -14,6 +20,7 @@ import { ChevronIcon } from "../icons/ChevronIcon";
|
|||
import { ErrorExpansion } from "./ErrorExpansion";
|
||||
import { MetadataDiffExpansion } from "./MetadataDiffExpansion";
|
||||
import { ResultPill } from "./ResultPill";
|
||||
import { ZipExpansion } from "./ZipExpansion";
|
||||
import { formatFileSize } from "../../utils/format_file_size";
|
||||
import { useI18n } from "../../hooks/use_i18n";
|
||||
|
||||
|
|
@ -44,11 +51,13 @@ export function FileRow({
|
|||
const hasDiffDocument =
|
||||
file.diffDocument !== null && file.diffDocument.before.length > 0;
|
||||
const diffPending = file.diffPending === true;
|
||||
const hasArchiveEntries =
|
||||
file.archiveEntries !== undefined && file.archiveEntries.length > 0;
|
||||
const isExpandable =
|
||||
isError ||
|
||||
file.status === FileProcessingStatus.NoMetadataFound ||
|
||||
(file.status === FileProcessingStatus.Complete &&
|
||||
(hasDiffDocument || diffPending));
|
||||
(hasDiffDocument || diffPending || hasArchiveEntries));
|
||||
|
||||
const rowClasses = [
|
||||
"file-table__row",
|
||||
|
|
@ -171,15 +180,31 @@ export function FileRow({
|
|||
)}
|
||||
{isExpanded &&
|
||||
isComplete &&
|
||||
file.status === FileProcessingStatus.NoMetadataFound && (
|
||||
file.status === FileProcessingStatus.NoMetadataFound &&
|
||||
!hasArchiveEntries &&
|
||||
(file.diffDocument !== null || diffPending ? (
|
||||
// Diff available or in-flight: show the diff table / skeleton so
|
||||
// the user can see what was intentionally preserved (orientation,
|
||||
// color profile). Without this, "Already clean" files with
|
||||
// preserve-flags on would show a blank expansion even though
|
||||
// ExifTool found metadata that we chose to keep.
|
||||
<MetadataDiffExpansion
|
||||
diffDocument={file.diffDocument}
|
||||
diffPending={diffPending}
|
||||
/>
|
||||
) : (
|
||||
<div className="file-table__expansion">
|
||||
<span className="file-table__expansion-empty">
|
||||
{t("noMetadataFound")}
|
||||
</span>
|
||||
</div>
|
||||
)}
|
||||
))}
|
||||
{isExpanded && isComplete && hasArchiveEntries && (
|
||||
<ZipExpansion entryId={file.id} entries={file.archiveEntries!} />
|
||||
)}
|
||||
{isExpanded &&
|
||||
file.status === FileProcessingStatus.Complete &&
|
||||
!hasArchiveEntries &&
|
||||
(hasDiffDocument || diffPending) && (
|
||||
<MetadataDiffExpansion
|
||||
diffDocument={file.diffDocument}
|
||||
|
|
|
|||
|
|
@ -1,4 +1,8 @@
|
|||
// Expandable per-file metadata diff.
|
||||
// Expandable per-file metadata diff. Top-level wrapper that chooses
|
||||
// between the two-pane table and the loading skeleton based on the
|
||||
// diff's async state. Table + skeleton bodies live in MetadataDiffTable
|
||||
// so ZipExpansion can reuse them per-leaf without inheriting the outer
|
||||
// file-table expansion chrome.
|
||||
//
|
||||
// Two-pane render: ExifTool's full metadata dump from the source on the
|
||||
// left, dump from the stripped file on the right. Rows are aligned across
|
||||
|
|
@ -16,13 +20,10 @@
|
|||
// Skeleton mode: while the async ExifTool read is in flight (diffPending)
|
||||
// and no diffDocument is on the entry yet, render a wayfinding cue so the
|
||||
// expansion area isn't blank when the user opens the row early.
|
||||
//
|
||||
// `t()` is the live i18n hook. The diff keys carry a `{count}` placeholder
|
||||
// interpolated locally (mirrors the ErrorExpansion.tsx pattern), since the
|
||||
// live `t` signature is `(key: string) => string` and does not interpolate.
|
||||
|
||||
import { useI18n } from "../../hooks/use_i18n";
|
||||
import type { MetadataDocument, MetadataEntry } from "../../../domain";
|
||||
import type { MetadataDocument } from "../../../domain";
|
||||
import { MetadataDiffTable, DiffSkeleton } from "./MetadataDiffTable";
|
||||
|
||||
export function MetadataDiffExpansion({
|
||||
diffDocument,
|
||||
|
|
@ -38,8 +39,11 @@ export function MetadataDiffExpansion({
|
|||
}): React.JSX.Element | null {
|
||||
const { t } = useI18n();
|
||||
|
||||
if (diffDocument != null && diffDocument.before.length > 0) {
|
||||
return <TwoPaneView document={diffDocument} t={t} />;
|
||||
if (
|
||||
diffDocument != null &&
|
||||
(diffDocument.before.length > 0 || diffDocument.after.length > 0)
|
||||
) {
|
||||
return <MetadataDiffTable document={diffDocument} t={t} />;
|
||||
}
|
||||
|
||||
// Diff still in flight — skeleton wayfinding cue.
|
||||
|
|
@ -47,284 +51,18 @@ export function MetadataDiffExpansion({
|
|||
return <DiffSkeleton t={t} />;
|
||||
}
|
||||
|
||||
// diff resolved but both sides are empty — file was already clean.
|
||||
if (diffDocument != null) {
|
||||
return (
|
||||
<div className="file-table__expansion">
|
||||
<span className="file-table__expansion-empty">
|
||||
{t("zipExpansion.alreadyClean")}
|
||||
</span>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
// Nothing to show. The `isExpandable` gate in FileRow normally prevents
|
||||
// reaching this branch.
|
||||
return null;
|
||||
}
|
||||
|
||||
// =========================================================================
|
||||
// Two-pane view
|
||||
// =========================================================================
|
||||
|
||||
type DiffRowStatus = "removed" | "added" | "modified" | "kept";
|
||||
|
||||
interface DiffRow {
|
||||
readonly status: DiffRowStatus;
|
||||
readonly source: string;
|
||||
readonly name: string;
|
||||
readonly before: string | null;
|
||||
readonly after: string | null;
|
||||
}
|
||||
|
||||
function TwoPaneView({
|
||||
document,
|
||||
t,
|
||||
}: {
|
||||
document: MetadataDocument;
|
||||
t: (key: string) => string;
|
||||
}): React.JSX.Element {
|
||||
const rows = computeDiffRows(document);
|
||||
const grouped = groupRowsBySource(rows);
|
||||
|
||||
return (
|
||||
<div className="file-table__expansion file-table__diff file-table__diff--two-pane">
|
||||
<div className="file-table__diff-pane-header">
|
||||
<span className="file-table__diff-pane-label file-table__diff-pane-label--before">
|
||||
{t("diffPaneBefore")}
|
||||
</span>
|
||||
<span className="file-table__diff-pane-label file-table__diff-pane-label--after">
|
||||
{t("diffPaneAfter")}
|
||||
</span>
|
||||
</div>
|
||||
{grouped.map(({ source, rows: groupRows }) => (
|
||||
<section key={source} className="file-table__diff-group">
|
||||
<h4 className="file-table__diff-group-header">
|
||||
{source} {makePaneGroupSummary(groupRows, t)}
|
||||
</h4>
|
||||
<div className="file-table__diff-pane-list">
|
||||
{groupRows.map((row, idx) => (
|
||||
<PaneRow
|
||||
key={`${row.source}-${row.name}-${idx}`}
|
||||
row={row}
|
||||
t={t}
|
||||
/>
|
||||
))}
|
||||
</div>
|
||||
</section>
|
||||
))}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
// Wayfinding skeleton shown while the out-of-band ExifTool diff build is
|
||||
// in flight. Reuses existing diff-group classes so the geometry matches the
|
||||
// loaded view (avoids layout shift when the skeleton swaps for the real
|
||||
// two-pane on `diffPending: false`). Uses its own i18n key
|
||||
// (`diffSkeletonLoading`) rather than reusing `diffLoadingToast` — the toast
|
||||
// and skeleton are different render contexts and translators may want to
|
||||
// diverge (e.g., shorter form for the toast where space is constrained).
|
||||
function DiffSkeleton({
|
||||
t,
|
||||
}: {
|
||||
t: (key: string) => string;
|
||||
}): React.JSX.Element {
|
||||
return (
|
||||
<div
|
||||
className="file-table__expansion file-table__diff file-table__diff--skeleton"
|
||||
role="status"
|
||||
aria-live="polite"
|
||||
aria-busy="true"
|
||||
>
|
||||
<span className="file-table__diff-value file-table__diff-value--placeholder">
|
||||
{t("diffSkeletonLoading")}
|
||||
</span>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
// Count-shaped walker values like "3 files" / "1 attribute" / "5 items"
|
||||
// don't represent a single field value — they're aggregate summaries
|
||||
// from the Office walker's structural deletions (comments, embeddings,
|
||||
// rsids, etc.). The legacy single-pane diff (pre chunk B.1) rendered
|
||||
// these as a pill badge instead of strikethrough text; the two-pane
|
||||
// view preserves that affordance.
|
||||
//
|
||||
// Pattern: leading digit(s) + a single space + one word (singular or
|
||||
// plural noun). Matches "3 files", "1 attribute", "12 items" but not
|
||||
// "Apple iPhone 14" or strings containing spaces past the noun.
|
||||
const COUNT_VALUE_RE = /^\d+ \w+$/;
|
||||
function isCountValue(s: string): boolean {
|
||||
return COUNT_VALUE_RE.test(s);
|
||||
}
|
||||
|
||||
function PaneRow({
|
||||
row,
|
||||
t,
|
||||
}: {
|
||||
row: DiffRow;
|
||||
t: (key: string) => string;
|
||||
}): React.JSX.Element {
|
||||
const empty = t("diffEmptyValue");
|
||||
// Count-shaped removed values render as a badge (pill) instead of
|
||||
// strikethrough — visually distinct because "3 files" is an aggregate
|
||||
// removed count, not a value that was scrubbed.
|
||||
const beforeIsCount = row.before !== null && isCountValue(row.before);
|
||||
const beforeClass =
|
||||
row.status === "removed" && beforeIsCount
|
||||
? "file-table__diff-value--count-badge"
|
||||
: row.status === "removed" || row.status === "modified"
|
||||
? "file-table__diff-value file-table__diff-value--strike"
|
||||
: "file-table__diff-value";
|
||||
const afterIsCount = row.after !== null && isCountValue(row.after);
|
||||
const afterClass =
|
||||
row.status === "added" && afterIsCount
|
||||
? "file-table__diff-value--count-badge file-table__diff-value--added"
|
||||
: row.status === "added" || row.status === "modified"
|
||||
? "file-table__diff-value file-table__diff-value--added"
|
||||
: "file-table__diff-value";
|
||||
return (
|
||||
<div
|
||||
className={`file-table__diff-pair file-table__diff-pair--${row.status}`}
|
||||
>
|
||||
<div className="file-table__diff-name">{row.name}</div>
|
||||
<div className="file-table__diff-pane-cell file-table__diff-pane-cell--before">
|
||||
{row.before !== null ? (
|
||||
<span className={beforeClass} title={row.before}>
|
||||
{row.before}
|
||||
</span>
|
||||
) : (
|
||||
<span className="file-table__diff-value file-table__diff-value--placeholder">
|
||||
{empty}
|
||||
</span>
|
||||
)}
|
||||
</div>
|
||||
<div className="file-table__diff-pane-cell file-table__diff-pane-cell--after">
|
||||
{row.after !== null ? (
|
||||
<span className={afterClass} title={row.after}>
|
||||
{row.after}
|
||||
</span>
|
||||
) : (
|
||||
<span className="file-table__diff-value file-table__diff-value--placeholder">
|
||||
{empty}
|
||||
</span>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
function computeDiffRows(document: MetadataDocument): readonly DiffRow[] {
|
||||
const afterByKey = new Map<string, MetadataEntry>();
|
||||
for (const entry of document.after) {
|
||||
afterByKey.set(makeKey(entry.source, entry.name), entry);
|
||||
}
|
||||
|
||||
const beforeKeys = new Set<string>();
|
||||
const rows: DiffRow[] = [];
|
||||
|
||||
for (const entry of document.before) {
|
||||
const key = makeKey(entry.source, entry.name);
|
||||
beforeKeys.add(key);
|
||||
const after = afterByKey.get(key);
|
||||
if (after === undefined) {
|
||||
rows.push({
|
||||
status: "removed",
|
||||
source: entry.source,
|
||||
name: entry.name,
|
||||
before: entry.value,
|
||||
after: null,
|
||||
});
|
||||
} else if (after.value === entry.value) {
|
||||
rows.push({
|
||||
status: "kept",
|
||||
source: entry.source,
|
||||
name: entry.name,
|
||||
before: entry.value,
|
||||
after: after.value,
|
||||
});
|
||||
} else {
|
||||
rows.push({
|
||||
status: "modified",
|
||||
source: entry.source,
|
||||
name: entry.name,
|
||||
before: entry.value,
|
||||
after: after.value,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
for (const entry of document.after) {
|
||||
const key = makeKey(entry.source, entry.name);
|
||||
if (!beforeKeys.has(key)) {
|
||||
rows.push({
|
||||
status: "added",
|
||||
source: entry.source,
|
||||
name: entry.name,
|
||||
before: null,
|
||||
after: entry.value,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
return rows;
|
||||
}
|
||||
|
||||
// NUL separator (not a space, not a colon) so the composed key can't
|
||||
// collide with a tag name that legitimately contains the separator.
|
||||
// Tag names from ExifTool -G1 are mostly ASCII identifiers, but spaces
|
||||
// have shown up in extended XMP namespaces. NUL is forbidden in every
|
||||
// metadata grammar we route, so it's a safe sentinel.
|
||||
function makeKey(source: string, name: string): string {
|
||||
return `${source}\0${name}`;
|
||||
}
|
||||
|
||||
function makePaneGroupSummary(
|
||||
rows: readonly DiffRow[],
|
||||
t: (key: string) => string,
|
||||
): string {
|
||||
let removed = 0;
|
||||
let modified = 0;
|
||||
let added = 0;
|
||||
let kept = 0;
|
||||
for (const r of rows) {
|
||||
switch (r.status) {
|
||||
case "removed":
|
||||
removed += 1;
|
||||
break;
|
||||
case "modified":
|
||||
modified += 1;
|
||||
break;
|
||||
case "added":
|
||||
added += 1;
|
||||
break;
|
||||
case "kept":
|
||||
kept += 1;
|
||||
break;
|
||||
}
|
||||
}
|
||||
const parts: string[] = [];
|
||||
if (removed > 0)
|
||||
parts.push(t("diffGroupRemoved").replace("{count}", String(removed)));
|
||||
if (modified > 0)
|
||||
parts.push(t("diffGroupModified").replace("{count}", String(modified)));
|
||||
if (added > 0)
|
||||
parts.push(t("diffGroupAdded").replace("{count}", String(added)));
|
||||
if (kept > 0) parts.push(t("diffGroupKept").replace("{count}", String(kept)));
|
||||
if (parts.length === 0) return "";
|
||||
return `· ${parts.join(t("diffGroupSeparator"))}`;
|
||||
}
|
||||
|
||||
interface SourceRowGroup {
|
||||
readonly source: string;
|
||||
readonly rows: readonly DiffRow[];
|
||||
}
|
||||
|
||||
function groupRowsBySource(
|
||||
rows: readonly DiffRow[],
|
||||
): readonly SourceRowGroup[] {
|
||||
const order: string[] = [];
|
||||
const byKey = new Map<string, DiffRow[]>();
|
||||
for (const row of rows) {
|
||||
const existing = byKey.get(row.source);
|
||||
if (existing === undefined) {
|
||||
order.push(row.source);
|
||||
byKey.set(row.source, [row]);
|
||||
} else {
|
||||
existing.push(row);
|
||||
}
|
||||
}
|
||||
return order.map((source) => ({
|
||||
source,
|
||||
rows: byKey.get(source) as DiffRow[],
|
||||
}));
|
||||
}
|
||||
|
|
|
|||
284
src/web/components/file-list/MetadataDiffTable.tsx
Normal file
284
src/web/components/file-list/MetadataDiffTable.tsx
Normal file
|
|
@ -0,0 +1,284 @@
|
|||
// Two-pane metadata diff table — extracted from MetadataDiffExpansion
|
||||
// so ZipExpansion can reuse the table body inside per-leaf rows without
|
||||
// inheriting the outer expansion chrome (see
|
||||
// docs/superpowers/specs/2026-05-22-issue-184-zip-support-design.md §4.5).
|
||||
//
|
||||
// `wrapperClassName` lets the consumer swap the outer wrapper class:
|
||||
// - MetadataDiffExpansion uses the default
|
||||
// "file-table__expansion file-table__diff file-table__diff--two-pane".
|
||||
// - ZipExpansion leaf renders pass "zip-expansion__leaf-diff" to get a
|
||||
// slimmer wrapper without the file-table expansion padding.
|
||||
|
||||
import type { MetadataDocument, MetadataEntry } from "../../../domain";
|
||||
|
||||
type DiffRowStatus = "removed" | "added" | "modified" | "kept";
|
||||
|
||||
interface DiffRow {
|
||||
readonly status: DiffRowStatus;
|
||||
readonly source: string;
|
||||
readonly name: string;
|
||||
readonly before: string | null;
|
||||
readonly after: string | null;
|
||||
}
|
||||
|
||||
export function MetadataDiffTable({
|
||||
document,
|
||||
t,
|
||||
wrapperClassName = "file-table__expansion file-table__diff file-table__diff--two-pane",
|
||||
}: {
|
||||
document: MetadataDocument;
|
||||
t: (key: string) => string;
|
||||
wrapperClassName?: string;
|
||||
}): React.JSX.Element {
|
||||
const rows = computeDiffRows(document);
|
||||
const grouped = groupRowsBySource(rows);
|
||||
|
||||
return (
|
||||
<div className={wrapperClassName}>
|
||||
<div className="file-table__diff-pane-header">
|
||||
<span className="file-table__diff-pane-label file-table__diff-pane-label--before">
|
||||
{t("diffPaneBefore")}
|
||||
</span>
|
||||
<span className="file-table__diff-pane-label file-table__diff-pane-label--after">
|
||||
{t("diffPaneAfter")}
|
||||
</span>
|
||||
</div>
|
||||
{grouped.map(({ source, rows: groupRows }) => (
|
||||
<section key={source} className="file-table__diff-group">
|
||||
<h4 className="file-table__diff-group-header">
|
||||
{source} {makePaneGroupSummary(groupRows, t)}
|
||||
</h4>
|
||||
<div className="file-table__diff-pane-list">
|
||||
{groupRows.map((row, idx) => (
|
||||
<PaneRow
|
||||
key={`${row.source}-${row.name}-${idx}`}
|
||||
row={row}
|
||||
t={t}
|
||||
/>
|
||||
))}
|
||||
</div>
|
||||
</section>
|
||||
))}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
// Wayfinding skeleton shown while the out-of-band ExifTool diff build is
|
||||
// in flight. Reuses existing diff-group classes so the geometry matches
|
||||
// the loaded view (avoids layout shift when the skeleton swaps for the
|
||||
// real two-pane). Exported so ZipExpansion can render the same skeleton
|
||||
// while a leaf's diff is loading.
|
||||
export function DiffSkeleton({
|
||||
t,
|
||||
wrapperClassName = "file-table__expansion file-table__diff file-table__diff--skeleton",
|
||||
}: {
|
||||
t: (key: string) => string;
|
||||
wrapperClassName?: string;
|
||||
}): React.JSX.Element {
|
||||
return (
|
||||
<div
|
||||
className={wrapperClassName}
|
||||
role="status"
|
||||
aria-live="polite"
|
||||
aria-busy="true"
|
||||
>
|
||||
<span className="file-table__diff-value file-table__diff-value--placeholder">
|
||||
{t("diffSkeletonLoading")}
|
||||
</span>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
// Count-shaped walker values like "3 files" / "1 attribute" / "5 items"
|
||||
// don't represent a single field value — they're aggregate summaries
|
||||
// from the Office walker's structural deletions (comments, embeddings,
|
||||
// rsids, etc.). The legacy single-pane diff (pre chunk B.1) rendered
|
||||
// these as a pill badge instead of strikethrough text; the two-pane
|
||||
// view preserves that affordance.
|
||||
//
|
||||
// Pattern: leading digit(s) + a single space + one word (singular or
|
||||
// plural noun). Matches "3 files", "1 attribute", "12 items" but not
|
||||
// "Apple iPhone 14" or strings containing spaces past the noun.
|
||||
const COUNT_VALUE_RE = /^\d+ \w+$/;
|
||||
function isCountValue(s: string): boolean {
|
||||
return COUNT_VALUE_RE.test(s);
|
||||
}
|
||||
|
||||
function PaneRow({
|
||||
row,
|
||||
t,
|
||||
}: {
|
||||
row: DiffRow;
|
||||
t: (key: string) => string;
|
||||
}): React.JSX.Element {
|
||||
const empty = t("diffEmptyValue");
|
||||
const beforeIsCount = row.before !== null && isCountValue(row.before);
|
||||
const beforeClass =
|
||||
row.status === "removed" && beforeIsCount
|
||||
? "file-table__diff-value--count-badge"
|
||||
: row.status === "removed" || row.status === "modified"
|
||||
? "file-table__diff-value file-table__diff-value--strike"
|
||||
: "file-table__diff-value";
|
||||
const afterIsCount = row.after !== null && isCountValue(row.after);
|
||||
const afterClass =
|
||||
row.status === "added" && afterIsCount
|
||||
? "file-table__diff-value--count-badge file-table__diff-value--added"
|
||||
: row.status === "added" || row.status === "modified"
|
||||
? "file-table__diff-value file-table__diff-value--added"
|
||||
: "file-table__diff-value";
|
||||
return (
|
||||
<div
|
||||
className={`file-table__diff-pair file-table__diff-pair--${row.status}`}
|
||||
>
|
||||
<div className="file-table__diff-name">{row.name}</div>
|
||||
<div className="file-table__diff-pane-cell file-table__diff-pane-cell--before">
|
||||
{row.before !== null ? (
|
||||
<span className={beforeClass} title={row.before}>
|
||||
{row.before}
|
||||
</span>
|
||||
) : (
|
||||
<span className="file-table__diff-value file-table__diff-value--placeholder">
|
||||
{empty}
|
||||
</span>
|
||||
)}
|
||||
</div>
|
||||
<div className="file-table__diff-pane-cell file-table__diff-pane-cell--after">
|
||||
{row.after !== null ? (
|
||||
<span className={afterClass} title={row.after}>
|
||||
{row.after}
|
||||
</span>
|
||||
) : (
|
||||
<span className="file-table__diff-value file-table__diff-value--placeholder">
|
||||
{empty}
|
||||
</span>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
function computeDiffRows(document: MetadataDocument): readonly DiffRow[] {
|
||||
const afterByKey = new Map<string, MetadataEntry>();
|
||||
for (const entry of document.after) {
|
||||
afterByKey.set(makeKey(entry.source, entry.name), entry);
|
||||
}
|
||||
|
||||
const beforeKeys = new Set<string>();
|
||||
const rows: DiffRow[] = [];
|
||||
|
||||
for (const entry of document.before) {
|
||||
const key = makeKey(entry.source, entry.name);
|
||||
beforeKeys.add(key);
|
||||
const after = afterByKey.get(key);
|
||||
if (after === undefined) {
|
||||
rows.push({
|
||||
status: "removed",
|
||||
source: entry.source,
|
||||
name: entry.name,
|
||||
before: entry.value,
|
||||
after: null,
|
||||
});
|
||||
} else if (after.value === entry.value) {
|
||||
rows.push({
|
||||
status: "kept",
|
||||
source: entry.source,
|
||||
name: entry.name,
|
||||
before: entry.value,
|
||||
after: after.value,
|
||||
});
|
||||
} else {
|
||||
rows.push({
|
||||
status: "modified",
|
||||
source: entry.source,
|
||||
name: entry.name,
|
||||
before: entry.value,
|
||||
after: after.value,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
for (const entry of document.after) {
|
||||
const key = makeKey(entry.source, entry.name);
|
||||
if (!beforeKeys.has(key)) {
|
||||
rows.push({
|
||||
status: "added",
|
||||
source: entry.source,
|
||||
name: entry.name,
|
||||
before: null,
|
||||
after: entry.value,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
return rows;
|
||||
}
|
||||
|
||||
// NUL separator (not a space, not a colon) so the composed key can't
|
||||
// collide with a tag name that legitimately contains the separator.
|
||||
// Tag names from ExifTool -G1 are mostly ASCII identifiers, but spaces
|
||||
// have shown up in extended XMP namespaces. NUL is forbidden in every
|
||||
// metadata grammar we route, so it's a safe sentinel.
|
||||
function makeKey(source: string, name: string): string {
|
||||
return `${source}\0${name}`;
|
||||
}
|
||||
|
||||
function makePaneGroupSummary(
|
||||
rows: readonly DiffRow[],
|
||||
t: (key: string) => string,
|
||||
): string {
|
||||
let removed = 0;
|
||||
let modified = 0;
|
||||
let added = 0;
|
||||
let kept = 0;
|
||||
for (const r of rows) {
|
||||
switch (r.status) {
|
||||
case "removed":
|
||||
removed += 1;
|
||||
break;
|
||||
case "modified":
|
||||
modified += 1;
|
||||
break;
|
||||
case "added":
|
||||
added += 1;
|
||||
break;
|
||||
case "kept":
|
||||
kept += 1;
|
||||
break;
|
||||
}
|
||||
}
|
||||
const parts: string[] = [];
|
||||
if (removed > 0)
|
||||
parts.push(t("diffGroupRemoved").replace("{count}", String(removed)));
|
||||
if (modified > 0)
|
||||
parts.push(t("diffGroupModified").replace("{count}", String(modified)));
|
||||
if (added > 0)
|
||||
parts.push(t("diffGroupAdded").replace("{count}", String(added)));
|
||||
if (kept > 0) parts.push(t("diffGroupKept").replace("{count}", String(kept)));
|
||||
if (parts.length === 0) return "";
|
||||
return `· ${parts.join(t("diffGroupSeparator"))}`;
|
||||
}
|
||||
|
||||
interface SourceRowGroup {
|
||||
readonly source: string;
|
||||
readonly rows: readonly DiffRow[];
|
||||
}
|
||||
|
||||
function groupRowsBySource(
|
||||
rows: readonly DiffRow[],
|
||||
): readonly SourceRowGroup[] {
|
||||
const order: string[] = [];
|
||||
const byKey = new Map<string, DiffRow[]>();
|
||||
for (const row of rows) {
|
||||
const existing = byKey.get(row.source);
|
||||
if (existing === undefined) {
|
||||
order.push(row.source);
|
||||
byKey.set(row.source, [row]);
|
||||
} else {
|
||||
existing.push(row);
|
||||
}
|
||||
}
|
||||
return order.map((source) => ({
|
||||
source,
|
||||
rows: byKey.get(source) as DiffRow[],
|
||||
}));
|
||||
}
|
||||
345
src/web/components/file-list/ZipExpansion.tsx
Normal file
345
src/web/components/file-list/ZipExpansion.tsx
Normal file
|
|
@ -0,0 +1,345 @@
|
|||
// Recursive tree view of inner ZIP entries. Renders the
|
||||
// `archiveEntries` from a ZipStrategy strip; each cleaned-leaf row
|
||||
// lazy-loads its per-leaf metadata diff on first expand via
|
||||
// window.api.wasm.buildArchiveLeafDiff.
|
||||
//
|
||||
// See docs/superpowers/specs/2026-05-22-issue-184-zip-support-design.md
|
||||
// §4.5 for the UI contract.
|
||||
|
||||
import { useCallback, useState } from "react";
|
||||
import type { LeafDiffResult } from "../../../application";
|
||||
import { assertNever } from "../../../common";
|
||||
import type { ArchiveEntryResult } from "../../../domain";
|
||||
import { useI18n } from "../../hooks/use_i18n";
|
||||
import { ChevronIcon } from "../icons/ChevronIcon";
|
||||
import { MetadataDiffTable, DiffSkeleton } from "./MetadataDiffTable";
|
||||
|
||||
// Pagination — render the first N entries eagerly; surface a button
|
||||
// to reveal the next N for archives larger than this.
|
||||
const VISIBLE_PAGE_SIZE = 100;
|
||||
// Visual indent caps at level 5 to avoid horizontal squeeze on mobile.
|
||||
// Recursion itself is unbounded; this only affects padding-left.
|
||||
const INDENT_CAP = 5;
|
||||
// Render-time depth limit — beyond this we stop rendering further
|
||||
// recursion to prevent adversarial archives (zip quine, deeply nested
|
||||
// archive of archives) from hanging the tab. Strategy-side recursion
|
||||
// caps at MAX_NESTING_DEPTH (10) so a well-formed result never reaches
|
||||
// this UI cap; we keep it as a defense-in-depth ceiling in case the
|
||||
// strategy ever evolves to ship deeper trees.
|
||||
const MAX_RENDER_DEPTH = 20;
|
||||
|
||||
// Per-leaf expansion state. `cachedResult` on "closed" lets a re-opened
|
||||
// leaf within the same ZipExpansion mount skip the API call. Across
|
||||
// outer-ZIP collapse + reopen (which unmounts ZipExpansion entirely) the
|
||||
// WasmProcessor's cachedLeafDocs is the authoritative cache; both layers
|
||||
// coexist deliberately.
|
||||
type LeafState =
|
||||
| { kind: "closed"; cachedResult?: LeafDiffResult }
|
||||
| { kind: "loading" }
|
||||
| { kind: "loaded"; result: LeafDiffResult };
|
||||
|
||||
export function ZipExpansion({
|
||||
entryId,
|
||||
entries,
|
||||
depth = 0,
|
||||
pathPrefix = "",
|
||||
}: {
|
||||
entryId: string;
|
||||
entries: readonly ArchiveEntryResult[];
|
||||
depth?: number;
|
||||
// Composite full-path prefix from the outer archive(s), with
|
||||
// trailing NUL separator. Empty at the top level; for a nested
|
||||
// ZIP at path `a.zip` it's `a.zip\0`. The leaf row uses
|
||||
// `pathPrefix + entry.path` as the full path when calling
|
||||
// `buildArchiveLeafDiff` so the lookup matches the key composed
|
||||
// in WasmProcessor.stashArchiveLeaves. Without this prefix,
|
||||
// nested zips with same-named leaves would collide.
|
||||
pathPrefix?: string;
|
||||
}): React.JSX.Element {
|
||||
const { t } = useI18n();
|
||||
const [visible, setVisible] = useState(VISIBLE_PAGE_SIZE);
|
||||
const [leafStates, setLeafStates] = useState<Map<string, LeafState>>(
|
||||
new Map(),
|
||||
);
|
||||
|
||||
// Stable callbacks: depend only on setLeafStates' setter identity
|
||||
// (constant across renders), so the rows can rely on referential
|
||||
// identity and we avoid recreating fresh closures per parent render.
|
||||
// Crucial for ZIPs with many entries — without this, a state update
|
||||
// re-renders every row (perf finding #14).
|
||||
const setLeafState = useCallback(
|
||||
(stateKey: string, next: LeafState): void => {
|
||||
setLeafStates((prev) => {
|
||||
const out = new Map(prev);
|
||||
out.set(stateKey, next);
|
||||
return out;
|
||||
});
|
||||
},
|
||||
[],
|
||||
);
|
||||
|
||||
const setLeafStateIfStill = useCallback(
|
||||
(args: {
|
||||
stateKey: string;
|
||||
fromKind: LeafState["kind"];
|
||||
next: LeafState;
|
||||
}): void => {
|
||||
setLeafStates((prev) => {
|
||||
const current = prev.get(args.stateKey);
|
||||
if ((current ?? { kind: "closed" }).kind !== args.fromKind) {
|
||||
return prev;
|
||||
}
|
||||
const out = new Map(prev);
|
||||
out.set(args.stateKey, args.next);
|
||||
return out;
|
||||
});
|
||||
},
|
||||
[],
|
||||
);
|
||||
|
||||
if (depth >= MAX_RENDER_DEPTH) {
|
||||
return (
|
||||
<div className="zip-expansion__depth-limit">
|
||||
{t("zipExpansion.depthLimit")}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
const shown = entries.slice(0, visible);
|
||||
const remaining = Math.max(0, entries.length - visible);
|
||||
const indentLevel = Math.min(depth, INDENT_CAP);
|
||||
|
||||
return (
|
||||
<div className="zip-expansion" data-depth={indentLevel}>
|
||||
{shown.map((entry, idx) => {
|
||||
// Index suffix makes the state key stable across duplicate
|
||||
// filenames in the same archive (legal per the ZIP spec).
|
||||
const stateKey = `${idx}\0${entry.path}`;
|
||||
const fullPath = pathPrefix + entry.path;
|
||||
return (
|
||||
<ZipExpansionRow
|
||||
key={stateKey}
|
||||
stateKey={stateKey}
|
||||
entryId={entryId}
|
||||
entry={entry}
|
||||
fullPath={fullPath}
|
||||
depth={depth}
|
||||
state={leafStates.get(stateKey) ?? { kind: "closed" }}
|
||||
setLeafState={setLeafState}
|
||||
setLeafStateIfStill={setLeafStateIfStill}
|
||||
t={t}
|
||||
/>
|
||||
);
|
||||
})}
|
||||
{remaining > 0 && (
|
||||
<button
|
||||
type="button"
|
||||
className="zip-expansion__show-more"
|
||||
onClick={() => setVisible((v) => v + VISIBLE_PAGE_SIZE)}
|
||||
>
|
||||
{t("zipExpansion.showMore").replace("{count}", String(remaining))}
|
||||
</button>
|
||||
)}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
function ZipExpansionRow({
|
||||
stateKey,
|
||||
entryId,
|
||||
entry,
|
||||
fullPath,
|
||||
depth,
|
||||
state,
|
||||
setLeafState,
|
||||
setLeafStateIfStill,
|
||||
t,
|
||||
}: {
|
||||
stateKey: string;
|
||||
entryId: string;
|
||||
entry: ArchiveEntryResult;
|
||||
fullPath: string;
|
||||
depth: number;
|
||||
state: LeafState;
|
||||
setLeafState: (stateKey: string, next: LeafState) => void;
|
||||
setLeafStateIfStill: (args: {
|
||||
stateKey: string;
|
||||
fromKind: LeafState["kind"];
|
||||
next: LeafState;
|
||||
}) => void;
|
||||
t: (key: string) => string;
|
||||
}): React.JSX.Element {
|
||||
// Both "cleaned" and "already-clean" are processable (the bytes stash
|
||||
// was populated; the diff is meaningful). The distinction is only for
|
||||
// the displayed status label.
|
||||
const wasProcessed =
|
||||
entry.status === "cleaned" || entry.status === "already-clean";
|
||||
const isCleanedLeaf = wasProcessed && entry.entries === null;
|
||||
const isNestedZip = wasProcessed && entry.entries !== null;
|
||||
const isExpandable = isCleanedLeaf || isNestedZip;
|
||||
const isExpanded =
|
||||
(isCleanedLeaf && state.kind !== "closed") ||
|
||||
(isNestedZip && state.kind === "loaded");
|
||||
|
||||
async function handleToggle(): Promise<void> {
|
||||
if (!isExpandable) return;
|
||||
if (isCleanedLeaf) {
|
||||
if (state.kind === "closed") {
|
||||
// Re-open: if we already fetched the result earlier in this
|
||||
// mount, use the cached value. The processor-level cache also
|
||||
// covers this case across unmounts; both are correct, this
|
||||
// just saves an IPC round-trip.
|
||||
if (state.cachedResult !== undefined) {
|
||||
setLeafState(stateKey, {
|
||||
kind: "loaded",
|
||||
result: state.cachedResult,
|
||||
});
|
||||
return;
|
||||
}
|
||||
setLeafState(stateKey, { kind: "loading" });
|
||||
try {
|
||||
const result = await window.api.wasm.buildArchiveLeafDiff(
|
||||
entryId,
|
||||
fullPath,
|
||||
);
|
||||
setLeafStateIfStill({
|
||||
stateKey,
|
||||
fromKind: "loading",
|
||||
next: { kind: "loaded", result },
|
||||
});
|
||||
} catch {
|
||||
setLeafStateIfStill({
|
||||
stateKey,
|
||||
fromKind: "loading",
|
||||
next: { kind: "loaded", result: { kind: "failed" } },
|
||||
});
|
||||
}
|
||||
} else {
|
||||
// Close: carry the result so a subsequent open can skip the
|
||||
// API call.
|
||||
setLeafState(
|
||||
stateKey,
|
||||
state.kind === "loaded"
|
||||
? { kind: "closed", cachedResult: state.result }
|
||||
: { kind: "closed" },
|
||||
);
|
||||
}
|
||||
} else if (isNestedZip) {
|
||||
// Nested-zip rows toggle between closed and loaded; the "loaded"
|
||||
// payload is unused — the real content comes from the recursive
|
||||
// <ZipExpansion>. Using "failed" as the sentinel keeps the loaded
|
||||
// payload uniform (LeafDiffResult shape).
|
||||
setLeafState(
|
||||
stateKey,
|
||||
state.kind === "loaded"
|
||||
? { kind: "closed" }
|
||||
: { kind: "loaded", result: { kind: "failed" } },
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
function handleKeyDown(e: React.KeyboardEvent): void {
|
||||
if (!isExpandable) return;
|
||||
if (e.key === "Enter" || e.key === " ") {
|
||||
e.preventDefault();
|
||||
void handleToggle();
|
||||
}
|
||||
}
|
||||
|
||||
const statusLabel = renderStatus(entry, t);
|
||||
|
||||
const rowClass = [
|
||||
"zip-expansion__row",
|
||||
`zip-expansion__row--${entry.status}`,
|
||||
isExpandable ? "zip-expansion__row--expandable" : "",
|
||||
]
|
||||
.filter(Boolean)
|
||||
.join(" ");
|
||||
|
||||
return (
|
||||
<div className="zip-expansion__entry">
|
||||
<div
|
||||
className={rowClass}
|
||||
role={isExpandable ? "button" : undefined}
|
||||
tabIndex={isExpandable ? 0 : -1}
|
||||
onClick={isExpandable ? () => void handleToggle() : undefined}
|
||||
onKeyDown={handleKeyDown}
|
||||
aria-expanded={isExpandable ? isExpanded : undefined}
|
||||
>
|
||||
<div className="zip-expansion__chevron">
|
||||
{isExpandable && <ChevronIcon expanded={isExpanded} />}
|
||||
</div>
|
||||
<div className="zip-expansion__path">{entry.path}</div>
|
||||
<div className="zip-expansion__status">{statusLabel}</div>
|
||||
</div>
|
||||
{isExpanded && isCleanedLeaf && state.kind === "loading" && (
|
||||
<DiffSkeleton
|
||||
t={t}
|
||||
wrapperClassName="zip-expansion__leaf-diff zip-expansion__leaf-diff--skeleton"
|
||||
/>
|
||||
)}
|
||||
{isExpanded &&
|
||||
isCleanedLeaf &&
|
||||
state.kind === "loaded" &&
|
||||
renderLeafBody(state.result, t)}
|
||||
{isExpanded && isNestedZip && entry.entries !== null && (
|
||||
<ZipExpansion
|
||||
entryId={entryId}
|
||||
entries={entry.entries}
|
||||
depth={depth + 1}
|
||||
pathPrefix={`${fullPath}\0`}
|
||||
/>
|
||||
)}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
// Maps a LeafDiffResult to the rendered body — either the two-pane diff,
|
||||
// an "already clean" message, or a "diff failed" error. The discriminated
|
||||
// shape makes the failed-vs-empty distinction explicit so an internal
|
||||
// error is never rendered as "Already clean".
|
||||
function renderLeafBody(
|
||||
result: LeafDiffResult,
|
||||
t: (key: string) => string,
|
||||
): React.JSX.Element {
|
||||
if (result.kind === "failed") {
|
||||
return (
|
||||
<div className="zip-expansion__leaf-empty">
|
||||
{t("zipExpansion.diffFailed")}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
const { doc } = result;
|
||||
if (doc.before.length > 0 || doc.after.length > 0) {
|
||||
return (
|
||||
<MetadataDiffTable
|
||||
document={doc}
|
||||
t={t}
|
||||
wrapperClassName="zip-expansion__leaf-diff file-table__diff--two-pane"
|
||||
/>
|
||||
);
|
||||
}
|
||||
return (
|
||||
<div className="zip-expansion__leaf-empty">
|
||||
{t("zipExpansion.alreadyClean")}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
|
||||
function renderStatus(
|
||||
entry: ArchiveEntryResult,
|
||||
t: (key: string) => string,
|
||||
): string {
|
||||
switch (entry.status) {
|
||||
case "cleaned":
|
||||
return t("zipExpansion.statusCleaned");
|
||||
case "already-clean":
|
||||
return t("zipExpansion.statusAlreadyClean");
|
||||
case "passed-through-unsupported":
|
||||
return t("zipExpansion.statusUnsupported");
|
||||
case "directory":
|
||||
return t("zipExpansion.statusDirectory");
|
||||
default:
|
||||
return assertNever({ value: entry.status });
|
||||
}
|
||||
}
|
||||
|
|
@ -1,7 +1,7 @@
|
|||
import { createContext, useContext, useReducer } from "react";
|
||||
import type { Dispatch, ReactNode } from "react";
|
||||
import { FileProcessingStatus } from "../../domain";
|
||||
import type { MetadataDocument } from "../../domain";
|
||||
import type { ArchiveEntryResult, MetadataDocument } from "../../domain";
|
||||
import { assertNever } from "../../common/types";
|
||||
|
||||
export type FolderDiscoveryStatus =
|
||||
|
|
@ -34,6 +34,13 @@ export interface FileEntry {
|
|||
// a loading skeleton when this is true. Optional so existing entry
|
||||
// initializers don't need to set it; treated as `false` when absent.
|
||||
diffPending?: boolean;
|
||||
// Strategy-emitted non-fatal warnings (currently only ZipStrategy).
|
||||
// Surfaced as an inline disclosure on the FileRow.
|
||||
warnings?: readonly string[];
|
||||
// Recursive tree of inner archive entries (currently only ZipStrategy
|
||||
// populates). When non-null and non-empty, FileRow's expansion area
|
||||
// renders <ZipExpansion> instead of <MetadataDiffExpansion>.
|
||||
archiveEntries?: readonly ArchiveEntryResult[];
|
||||
}
|
||||
|
||||
export interface AppState {
|
||||
|
|
@ -53,6 +60,8 @@ export type AppAction =
|
|||
afterBytes: number;
|
||||
diffDocument: MetadataDocument | null;
|
||||
diffPending: boolean;
|
||||
warnings: readonly string[];
|
||||
archiveEntries: readonly ArchiveEntryResult[] | null;
|
||||
}
|
||||
| {
|
||||
type: "UPDATE_FILE_DIFF";
|
||||
|
|
@ -100,6 +109,11 @@ export function appReducer(state: AppState, action: AppAction): AppState {
|
|||
afterBytes: action.afterBytes,
|
||||
diffDocument: action.diffDocument,
|
||||
diffPending: action.diffPending,
|
||||
warnings: action.warnings,
|
||||
// exactOptionalPropertyTypes: omit when null.
|
||||
...(action.archiveEntries !== null && {
|
||||
archiveEntries: action.archiveEntries,
|
||||
}),
|
||||
}
|
||||
: file,
|
||||
),
|
||||
|
|
|
|||
|
|
@ -4,6 +4,7 @@ import type { FileEntry, AppAction } from "../contexts/AppContext";
|
|||
import { useAppContext } from "../contexts/AppContext";
|
||||
import { useI18n } from "./use_i18n";
|
||||
import { FileProcessingStatus, exceedsCap } from "../../domain";
|
||||
import type { ArchiveEntryResult } from "../../domain";
|
||||
import { getCurrentSizeCap } from "../utils/get_size_cap";
|
||||
import { formatFileSize } from "../utils/format_file_size";
|
||||
|
||||
|
|
@ -179,14 +180,17 @@ async function processViaWasm({
|
|||
}
|
||||
|
||||
const outputBytes = result.outputBytes ?? 0;
|
||||
// A file is "cleaned" if the output is smaller than the input. Post-B.1
|
||||
// the per-strategy MetadataItem[] enumeration is gone (ExifTool's read
|
||||
// is the diff source); bytesReduced is the only synchronous signal we
|
||||
// have at this point. If the file had no removable metadata, output ≥
|
||||
// input and we tag NoMetadataFound. The async diff (when it arrives via
|
||||
// UPDATE_FILE_DIFF) provides the per-row detail, but doesn't drive the
|
||||
// status pill.
|
||||
// A file is "cleaned" if the output is smaller than the input, OR if
|
||||
// it's an archive and at least one inner entry was cleaned. The byte-
|
||||
// size comparison alone is wrong for ZIPs: even with DEFLATE re-
|
||||
// encoding, a ZIP containing only binary-compressed files (JPEG, PNG)
|
||||
// won't shrink because those entries are already incompressible. Using
|
||||
// cleaned-entry count as the authoritative signal for archives gives
|
||||
// correct Complete/NoMetadataFound status independent of compression.
|
||||
const bytesReduced = outputBytes > 0 && outputBytes < entry.size;
|
||||
const archiveEntries = result.archiveEntries ?? null;
|
||||
const hasCleanedArchiveEntry =
|
||||
archiveEntries !== null && hasAnyCleanedEntry(archiveEntries);
|
||||
// `result.diffDocument` is null when the flag is on (the build will fire
|
||||
// async, see below) or undefined when the API surface doesn't include it
|
||||
// (defensive — older shape, older mocks). Treat both as "no diff yet".
|
||||
|
|
@ -201,13 +205,19 @@ async function processViaWasm({
|
|||
// fired below and dispatches UPDATE_FILE_DIFF when it lands. While
|
||||
// pending, MetadataDiffExpansion shows a skeleton.
|
||||
diffPending: ENABLE_EXIFTOOL_DIFF && !hasDiffNow,
|
||||
// Strategy-emitted fields (currently ZipStrategy only). `result`
|
||||
// is the WasmApi return shape which defaults warnings to [] and
|
||||
// archiveEntries to null when the underlying outcome omitted them.
|
||||
warnings: result.warnings ?? [],
|
||||
archiveEntries: result.archiveEntries ?? null,
|
||||
});
|
||||
dispatch({
|
||||
type: "UPDATE_FILE_STATUS",
|
||||
id: entry.id,
|
||||
status: bytesReduced
|
||||
? FileProcessingStatus.Complete
|
||||
: FileProcessingStatus.NoMetadataFound,
|
||||
status:
|
||||
bytesReduced || hasCleanedArchiveEntry
|
||||
? FileProcessingStatus.Complete
|
||||
: FileProcessingStatus.NoMetadataFound,
|
||||
});
|
||||
window.api.files.notifyFileProcessed();
|
||||
|
||||
|
|
@ -215,7 +225,15 @@ async function processViaWasm({
|
|||
// inline — zeroperl.wasm runs on the main thread; a "background"
|
||||
// diff fired here would steal CPU from the next strip iteration.
|
||||
// See the comment on processFileEntries for the full rationale.
|
||||
if (ENABLE_EXIFTOOL_DIFF && !hasDiffNow) {
|
||||
//
|
||||
// Archive rows (currently only ZIP) don't have a top-level pending
|
||||
// diff — WasmProcessor stashes per-leaf inputs instead and the UI
|
||||
// builds those lazily via buildArchiveLeafDiff. Enqueuing them here
|
||||
// would cause buildDiffDocumentForEntry to return null for every ZIP
|
||||
// and dispatch a wasted UPDATE_FILE_DIFF(null) — 100 ZIPs would issue
|
||||
// 100 useless IPC round-trips. Skip them.
|
||||
const isArchive = archiveEntries !== null;
|
||||
if (ENABLE_EXIFTOOL_DIFF && !hasDiffNow && !isArchive) {
|
||||
diffEntries.push(entry);
|
||||
}
|
||||
}
|
||||
|
|
@ -262,19 +280,49 @@ async function buildDiffInBackground({
|
|||
}
|
||||
}
|
||||
|
||||
// Recursively checks whether any entry in an archive tree has status
|
||||
// "cleaned" (NOT "already-clean"). Used to determine Complete vs
|
||||
// NoMetadataFound for ZIPs whose inner files are binary-compressed
|
||||
// (JPEG, PNG) — those don't shrink the outer ZIP's byte count even when
|
||||
// their EXIF is stripped, so byte-size comparison alone misclassifies
|
||||
// them as "Already clean". "already-clean" entries are explicitly excluded:
|
||||
// the whole point of that status is that nothing was actually removed.
|
||||
function hasAnyCleanedEntry(entries: readonly ArchiveEntryResult[]): boolean {
|
||||
for (const entry of entries) {
|
||||
if (entry.status === "cleaned") return true;
|
||||
if (entry.entries !== null && hasAnyCleanedEntry(entry.entries))
|
||||
return true;
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
// A diff "has changes" when any entry is removed (present in `before`, absent
|
||||
// from `after`), modified (same source+name, different value), or added (only
|
||||
// in `after`). Matches the classification done at render time in
|
||||
// MetadataDiffExpansion's computeDiffRows.
|
||||
//
|
||||
// Uses a multiset comparison (count of each source+name+value tuple) so
|
||||
// duplicate entries are handled correctly. A naive Map-based check
|
||||
// collapsed duplicates and produced false negatives — e.g. before=[(A,X),
|
||||
// (A,X)] vs after=[(A,X),(B,Y)] would report "no changes" because the
|
||||
// duplicate match consumed both before entries while the (B,Y) addition
|
||||
// was never inspected.
|
||||
function diffHasChanges(doc: {
|
||||
before: readonly { source: string; name: string; value: string }[];
|
||||
after: readonly { source: string; name: string; value: string }[];
|
||||
}): boolean {
|
||||
if (doc.before.length !== doc.after.length) return true;
|
||||
const afterByKey = new Map<string, string>();
|
||||
for (const e of doc.after) afterByKey.set(`${e.source}\0${e.name}`, e.value);
|
||||
const key = (e: { source: string; name: string; value: string }): string =>
|
||||
`${e.source}\0${e.name}\0${e.value}`;
|
||||
const counts = new Map<string, number>();
|
||||
for (const e of doc.before) {
|
||||
if (afterByKey.get(`${e.source}\0${e.name}`) !== e.value) return true;
|
||||
counts.set(key(e), (counts.get(key(e)) ?? 0) + 1);
|
||||
}
|
||||
for (const e of doc.after) {
|
||||
const k = key(e);
|
||||
const c = counts.get(k);
|
||||
if (c === undefined || c === 0) return true;
|
||||
counts.set(k, c - 1);
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
|
|
|||
|
|
@ -11,6 +11,7 @@ import "./styles/file_browse_button.css";
|
|||
import "./styles/error_boundary.css";
|
||||
import "./styles/file_list.css";
|
||||
import "./styles/file_table.css";
|
||||
import "./styles/zip-expansion.css";
|
||||
import "./styles/folder_row.css";
|
||||
import "./styles/status_bar.css";
|
||||
import "./styles/status_icon.css";
|
||||
|
|
|
|||
119
src/web/styles/zip-expansion.css
Normal file
119
src/web/styles/zip-expansion.css
Normal file
|
|
@ -0,0 +1,119 @@
|
|||
/* Tree view for ZIP inner-entry diffs. See ZipExpansion.tsx. */
|
||||
|
||||
.zip-expansion {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 4px;
|
||||
padding: 8px 0;
|
||||
}
|
||||
|
||||
.zip-expansion[data-depth="0"] {
|
||||
padding-left: 0;
|
||||
}
|
||||
.zip-expansion[data-depth="1"] {
|
||||
padding-left: 16px;
|
||||
}
|
||||
.zip-expansion[data-depth="2"] {
|
||||
padding-left: 32px;
|
||||
}
|
||||
.zip-expansion[data-depth="3"] {
|
||||
padding-left: 48px;
|
||||
}
|
||||
.zip-expansion[data-depth="4"] {
|
||||
padding-left: 64px;
|
||||
}
|
||||
.zip-expansion[data-depth="5"] {
|
||||
padding-left: 80px;
|
||||
}
|
||||
|
||||
.zip-expansion__entry {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
}
|
||||
|
||||
.zip-expansion__row {
|
||||
display: grid;
|
||||
grid-template-columns: 32px 1fr auto;
|
||||
align-items: center;
|
||||
gap: 8px;
|
||||
padding: 6px 12px;
|
||||
border-radius: 4px;
|
||||
background: var(--surface-2, rgba(127, 127, 127, 0.05));
|
||||
}
|
||||
|
||||
.zip-expansion__row--expandable {
|
||||
cursor: pointer;
|
||||
}
|
||||
|
||||
.zip-expansion__row--expandable:hover,
|
||||
.zip-expansion__row--expandable:focus {
|
||||
background: var(--surface-3, rgba(127, 127, 127, 0.1));
|
||||
outline: none;
|
||||
}
|
||||
|
||||
.zip-expansion__chevron {
|
||||
min-width: 24px;
|
||||
min-height: 24px;
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
}
|
||||
|
||||
.zip-expansion__path {
|
||||
font-family: var(--font-mono, ui-monospace, monospace);
|
||||
font-size: 0.9em;
|
||||
overflow: hidden;
|
||||
text-overflow: ellipsis;
|
||||
white-space: nowrap;
|
||||
}
|
||||
|
||||
.zip-expansion__status {
|
||||
font-size: 0.85em;
|
||||
opacity: 0.75;
|
||||
}
|
||||
|
||||
.zip-expansion__row--passed-through-unsupported,
|
||||
.zip-expansion__row--directory {
|
||||
opacity: 0.7;
|
||||
}
|
||||
|
||||
.zip-expansion__leaf-diff {
|
||||
padding: 8px 12px 8px 44px;
|
||||
}
|
||||
|
||||
.zip-expansion__leaf-diff--skeleton {
|
||||
padding: 12px 12px 12px 44px;
|
||||
font-style: italic;
|
||||
opacity: 0.7;
|
||||
}
|
||||
|
||||
.zip-expansion__leaf-empty {
|
||||
padding: 6px 12px 6px 44px;
|
||||
font-style: italic;
|
||||
opacity: 0.65;
|
||||
font-size: 0.9em;
|
||||
}
|
||||
|
||||
.zip-expansion__show-more {
|
||||
align-self: flex-start;
|
||||
margin-top: 6px;
|
||||
margin-left: 12px;
|
||||
background: transparent;
|
||||
border: 1px solid var(--border-1, rgba(127, 127, 127, 0.3));
|
||||
border-radius: 4px;
|
||||
padding: 4px 10px;
|
||||
font-size: 0.85em;
|
||||
cursor: pointer;
|
||||
color: inherit;
|
||||
}
|
||||
|
||||
.zip-expansion__show-more:hover {
|
||||
background: var(--surface-3, rgba(127, 127, 127, 0.1));
|
||||
}
|
||||
|
||||
.zip-expansion__depth-limit {
|
||||
padding: 6px 12px;
|
||||
font-style: italic;
|
||||
opacity: 0.6;
|
||||
font-size: 0.85em;
|
||||
}
|
||||
BIN
tests/e2e/fixtures/sample-zip.zip
Normal file
BIN
tests/e2e/fixtures/sample-zip.zip
Normal file
Binary file not shown.
122
tests/e2e/web/zip-archive.spec.ts
Normal file
122
tests/e2e/web/zip-archive.spec.ts
Normal file
|
|
@ -0,0 +1,122 @@
|
|||
// ZIP archive support — Web (issue #184)
|
||||
//
|
||||
// End-to-end coverage for the ZipStrategy + ZipExpansion tree UI.
|
||||
// Drops a fixture .zip containing an EXIF-tagged JPEG and a directory
|
||||
// entry. Asserts:
|
||||
// - The row becomes expandable on completion.
|
||||
// - The expansion area renders the ZipExpansion tree (NOT the
|
||||
// MetadataDiffExpansion two-pane — those are mutually exclusive
|
||||
// per the FileRow gate).
|
||||
// - Inner JPEG row is rendered with a chevron; directory row is not.
|
||||
// - Clicking the inner JPEG row triggers the lazy diff load — a
|
||||
// DiffSkeleton appears, then swaps for the MetadataDiffTable.
|
||||
//
|
||||
// Encrypted-archive refusal is covered by the unit suite in
|
||||
// tests/infrastructure/wasm/zip_strategy.test.ts; building an encrypted
|
||||
// fixture for e2e adds complexity without surfacing new UI behavior.
|
||||
|
||||
import { test, expect } from "@playwright/test";
|
||||
import { launchPage } from "./helpers/page_launcher";
|
||||
import { fixturePath } from "./helpers/fixture_loader";
|
||||
|
||||
test.describe("ZIP archive — tree expansion + lazy per-leaf diff", () => {
|
||||
test.beforeEach(async ({ page }) => {
|
||||
await launchPage(page);
|
||||
});
|
||||
|
||||
test("expandable row routes to ZipExpansion (not MetadataDiffExpansion)", async ({
|
||||
page,
|
||||
isMobile,
|
||||
browserName,
|
||||
}) => {
|
||||
// Same WebKit caveat as metadata_diff.spec.ts: ExifTool diff WASM
|
||||
// load is unreliable on Playwright's WebKit driver; the static
|
||||
// tree render is exercised by ZipExpansion unit tests on every
|
||||
// project.
|
||||
test.skip(
|
||||
browserName === "webkit",
|
||||
"WebKit driver — zeroperl WASM load is unreliable on this driver.",
|
||||
);
|
||||
test.setTimeout(45_000);
|
||||
|
||||
const input = page.locator(".file-browse-button__input").first();
|
||||
await input.setInputFiles([fixturePath("sample-zip.zip")], { force: true });
|
||||
|
||||
const row = page.locator(".file-table__row--complete").first();
|
||||
await expect(row).toBeVisible({ timeout: 30_000 });
|
||||
await expect(row).toHaveClass(/file-table__row--expandable/);
|
||||
|
||||
if (isMobile) {
|
||||
await row.tap();
|
||||
} else {
|
||||
await row.click();
|
||||
}
|
||||
|
||||
// ZipExpansion tree is rendered; MetadataDiffExpansion two-pane is NOT.
|
||||
const tree = page.locator(".zip-expansion");
|
||||
await expect(tree).toBeVisible();
|
||||
await expect(page.locator(".file-table__diff--two-pane")).toHaveCount(0);
|
||||
|
||||
// Inner JPEG row visible (cleaned status, with chevron).
|
||||
const jpegRow = tree.locator(".zip-expansion__row--cleaned", {
|
||||
hasText: "photo.jpg",
|
||||
});
|
||||
await expect(jpegRow).toBeVisible();
|
||||
await expect(jpegRow).toHaveClass(/zip-expansion__row--expandable/);
|
||||
|
||||
// Directory row visible, NOT expandable.
|
||||
const dirRow = tree.locator(".zip-expansion__row--directory");
|
||||
await expect(dirRow).toBeVisible();
|
||||
await expect(dirRow).not.toHaveClass(/zip-expansion__row--expandable/);
|
||||
});
|
||||
|
||||
test("clicking an inner-JPEG row loads its diff lazily (skeleton → table)", async ({
|
||||
page,
|
||||
isMobile,
|
||||
browserName,
|
||||
}) => {
|
||||
test.skip(
|
||||
browserName === "webkit",
|
||||
"WebKit driver — zeroperl WASM load is unreliable on this driver.",
|
||||
);
|
||||
test.setTimeout(60_000);
|
||||
|
||||
const input = page.locator(".file-browse-button__input").first();
|
||||
await input.setInputFiles([fixturePath("sample-zip.zip")], { force: true });
|
||||
|
||||
const row = page.locator(".file-table__row--complete").first();
|
||||
await expect(row).toBeVisible({ timeout: 30_000 });
|
||||
|
||||
if (isMobile) {
|
||||
await row.tap();
|
||||
} else {
|
||||
await row.click();
|
||||
}
|
||||
|
||||
const jpegRow = page
|
||||
.locator(".zip-expansion__row--cleaned", { hasText: "photo.jpg" })
|
||||
.first();
|
||||
await expect(jpegRow).toBeVisible();
|
||||
|
||||
if (isMobile) {
|
||||
await jpegRow.tap();
|
||||
} else {
|
||||
await jpegRow.click();
|
||||
}
|
||||
|
||||
// On a cold session the skeleton appears first; on a warm session
|
||||
// (subsequent leaf in the same test) the table appears directly.
|
||||
// Either way, the table eventually shows. Allow up to 30s for the
|
||||
// first-of-session WASM warm-up.
|
||||
const table = page.locator(".zip-expansion__leaf-diff").first();
|
||||
await expect(table).toBeVisible({ timeout: 30_000 });
|
||||
|
||||
// The two-pane diff content uses the same .file-table__diff-pair
|
||||
// classes as the top-level diff. At least one row should classify
|
||||
// as "removed" because the sample.zip's inner JPEG has an EXIF
|
||||
// Artist sentinel that the JPEG strategy drops.
|
||||
await expect(
|
||||
page.locator(".file-table__diff-pair--removed").first(),
|
||||
).toBeVisible({ timeout: 30_000 });
|
||||
});
|
||||
});
|
||||
BIN
tests/infrastructure/wasm/zip_strategy.test.ts
Normal file
BIN
tests/infrastructure/wasm/zip_strategy.test.ts
Normal file
Binary file not shown.
126
tests/web/components/file_row.test.tsx
Normal file
126
tests/web/components/file_row.test.tsx
Normal file
|
|
@ -0,0 +1,126 @@
|
|||
import { describe, it, expect } from "vitest";
|
||||
import { renderToStaticMarkup } from "react-dom/server";
|
||||
import { I18nContext } from "../../../src/web/contexts/I18nContext";
|
||||
import { FileRow } from "../../../src/web/components/file-list/FileRow";
|
||||
import { FileProcessingStatus } from "../../../src/domain";
|
||||
import type { FileEntry } from "../../../src/web/contexts/AppContext";
|
||||
import type { MetadataDocument } from "../../../src/domain";
|
||||
|
||||
// Minimal i18n stub; FileRow only looks up a small set of keys for the
|
||||
// expansion area, none of which are interpolated.
|
||||
function wrap(children: React.ReactNode): React.JSX.Element {
|
||||
return (
|
||||
<I18nContext.Provider
|
||||
value={{
|
||||
t: (key: string) => key,
|
||||
locale: "en",
|
||||
isLoading: false,
|
||||
}}
|
||||
>
|
||||
{children}
|
||||
</I18nContext.Provider>
|
||||
);
|
||||
}
|
||||
|
||||
function makeEntry(overrides: Partial<FileEntry>): FileEntry {
|
||||
return {
|
||||
id: "test-id",
|
||||
path: "/test.jpg",
|
||||
name: "test.jpg",
|
||||
extension: ".jpg",
|
||||
size: 100,
|
||||
folder: null,
|
||||
relativePath: null,
|
||||
status: FileProcessingStatus.NoMetadataFound,
|
||||
afterBytes: 100,
|
||||
error: null,
|
||||
diffDocument: null,
|
||||
...overrides,
|
||||
};
|
||||
}
|
||||
|
||||
function render(file: FileEntry): string {
|
||||
// FileRow reads `.current` synchronously during render to check
|
||||
// shouldAnimateCheck — must be initialised to a Set, not null.
|
||||
const animatedCheckRef: React.RefObject<Set<string>> = {
|
||||
current: new Set<string>(),
|
||||
};
|
||||
return renderToStaticMarkup(
|
||||
wrap(
|
||||
<FileRow
|
||||
file={file}
|
||||
isExpanded={true}
|
||||
onToggleExpand={() => {}}
|
||||
staggerIndex={0}
|
||||
animatedCheckRef={animatedCheckRef}
|
||||
onCopyToast={() => {}}
|
||||
/>,
|
||||
),
|
||||
);
|
||||
}
|
||||
|
||||
describe("FileRow — NoMetadataFound expansion routing", () => {
|
||||
it("renders generic 'noMetadataFound' text when no diff is available or pending", () => {
|
||||
const html = render(
|
||||
makeEntry({
|
||||
status: FileProcessingStatus.NoMetadataFound,
|
||||
diffDocument: null,
|
||||
diffPending: false,
|
||||
}),
|
||||
);
|
||||
// The static-text branch uses the file-table__expansion-empty class.
|
||||
expect(html).toContain("file-table__expansion-empty");
|
||||
expect(html).toContain("noMetadataFound");
|
||||
});
|
||||
|
||||
it("renders MetadataDiffExpansion (skeleton) when diffPending is true", () => {
|
||||
// Skeleton path: diffPending true and diffDocument null.
|
||||
const html = render(
|
||||
makeEntry({
|
||||
status: FileProcessingStatus.NoMetadataFound,
|
||||
diffDocument: null,
|
||||
diffPending: true,
|
||||
}),
|
||||
);
|
||||
// Skeleton renders a known marker class.
|
||||
expect(html).toContain("file-table__diff--skeleton");
|
||||
});
|
||||
|
||||
it("renders MetadataDiffExpansion (diff table) when diffDocument has content", () => {
|
||||
// File preserved orientation: diff has 'Orientation: 6' as a kept row.
|
||||
const diffDocument: MetadataDocument = {
|
||||
before: [{ source: "EXIF", name: "Orientation", value: "Rotate 90 CW" }],
|
||||
after: [{ source: "EXIF", name: "Orientation", value: "Rotate 90 CW" }],
|
||||
};
|
||||
const html = render(
|
||||
makeEntry({
|
||||
status: FileProcessingStatus.NoMetadataFound,
|
||||
diffDocument,
|
||||
diffPending: false,
|
||||
}),
|
||||
);
|
||||
// Two-pane table marker class — confirms MetadataDiffExpansion rendered.
|
||||
expect(html).toContain("file-table__diff--two-pane");
|
||||
// Must NOT fall through to the generic text branch.
|
||||
expect(html).not.toContain("file-table__expansion-empty");
|
||||
});
|
||||
|
||||
it("renders MetadataDiffExpansion (already-clean message) when diff has empty before+after", () => {
|
||||
// File was truly clean: ExifTool returned empty arrays.
|
||||
const diffDocument: MetadataDocument = { before: [], after: [] };
|
||||
const html = render(
|
||||
makeEntry({
|
||||
status: FileProcessingStatus.NoMetadataFound,
|
||||
diffDocument,
|
||||
diffPending: false,
|
||||
}),
|
||||
);
|
||||
// MetadataDiffExpansion renders the already-clean message in its own
|
||||
// expansion-empty span (different from FileRow's generic noMetadataFound).
|
||||
expect(html).toContain("file-table__expansion-empty");
|
||||
expect(html).toContain("zipExpansion.alreadyClean");
|
||||
// The generic noMetadataFound key from FileRow's fallback branch
|
||||
// must NOT appear — routing should send us to MetadataDiffExpansion.
|
||||
expect(html).not.toContain("noMetadataFound");
|
||||
});
|
||||
});
|
||||
|
|
@ -151,9 +151,11 @@ describe("MetadataDiffExpansion — two-pane diff", () => {
|
|||
expect(html).toBe("");
|
||||
});
|
||||
|
||||
it("renders nothing when diffDocument has empty before", () => {
|
||||
it("renders the alreadyClean message when diffDocument has empty before and after", () => {
|
||||
const html = renderTwoPane({ before: [], after: [] });
|
||||
expect(html).toBe("");
|
||||
// Both sides empty means the file was already clean — show a message
|
||||
// rather than a blank expansion area (avoids ambiguity with diff failure).
|
||||
expect(html).toContain("file-table__expansion-empty");
|
||||
});
|
||||
|
||||
it("renders a skeleton while diffPending is true and diffDocument is null", () => {
|
||||
|
|
|
|||
179
tests/web/components/zip_expansion.test.tsx
Normal file
179
tests/web/components/zip_expansion.test.tsx
Normal file
|
|
@ -0,0 +1,179 @@
|
|||
import { describe, it, expect } from "vitest";
|
||||
import { renderToStaticMarkup } from "react-dom/server";
|
||||
import { I18nContext } from "../../../src/web/contexts/I18nContext";
|
||||
import { ZipExpansion } from "../../../src/web/components/file-list/ZipExpansion";
|
||||
import type { ArchiveEntryResult } from "../../../src/domain";
|
||||
|
||||
const DICT: Record<string, string> = {
|
||||
"zipExpansion.statusCleaned": "Cleaned",
|
||||
"zipExpansion.statusUnsupported": "Unsupported — passed through",
|
||||
"zipExpansion.statusDirectory": "Directory",
|
||||
"zipExpansion.showMore": "Show {count} more entries",
|
||||
"zipExpansion.depthLimit": "Depth limit reached — drop the inner file directly",
|
||||
"zipExpansion.noMetadata": "No metadata detected",
|
||||
"zipExpansion.diffFailed": "Couldn't load diff — internal error",
|
||||
diffPaneBefore: "Before",
|
||||
diffPaneAfter: "After",
|
||||
diffSkeletonLoading: "Loading metadata reader…",
|
||||
};
|
||||
|
||||
function wrap(children: React.ReactNode): React.JSX.Element {
|
||||
return (
|
||||
<I18nContext.Provider
|
||||
value={{
|
||||
t: (key: string) => DICT[key] ?? key,
|
||||
locale: "en",
|
||||
isLoading: false,
|
||||
}}
|
||||
>
|
||||
{children}
|
||||
</I18nContext.Provider>
|
||||
);
|
||||
}
|
||||
|
||||
function makeLeaf(
|
||||
path: string,
|
||||
overrides: Partial<ArchiveEntryResult> = {},
|
||||
): ArchiveEntryResult {
|
||||
return {
|
||||
path,
|
||||
status: "cleaned",
|
||||
sourceBytes: new Uint8Array([1, 2, 3]),
|
||||
strippedBytes: new Uint8Array([1, 2]),
|
||||
walkerEntries: [],
|
||||
entries: null,
|
||||
warnings: [],
|
||||
...overrides,
|
||||
};
|
||||
}
|
||||
|
||||
function render(entries: readonly ArchiveEntryResult[], depth = 0): string {
|
||||
return renderToStaticMarkup(
|
||||
wrap(<ZipExpansion entryId="entry-1" entries={entries} depth={depth} />),
|
||||
);
|
||||
}
|
||||
|
||||
describe("ZipExpansion — entry rendering", () => {
|
||||
it("renders one row per entry with the correct path", () => {
|
||||
const html = render([
|
||||
makeLeaf("photo.jpg"),
|
||||
makeLeaf("folder/doc.pdf"),
|
||||
]);
|
||||
expect(html).toContain("photo.jpg");
|
||||
expect(html).toContain("folder/doc.pdf");
|
||||
});
|
||||
|
||||
it("renders cleaned-leaf rows with a chevron and 'Cleaned' status", () => {
|
||||
const html = render([makeLeaf("photo.jpg")]);
|
||||
expect(html).toContain("zip-expansion__row--cleaned");
|
||||
expect(html).toContain("Cleaned");
|
||||
// Chevron icon should be present (SVG path inside the chevron slot)
|
||||
expect(html).toContain("zip-expansion__chevron");
|
||||
});
|
||||
|
||||
// Encrypted archives are refused upfront in v1 (see spec §3 +
|
||||
// gap-analysis encrypted-entry row), so no ArchiveEntryResult is
|
||||
// ever produced with that status. The "passed-through-encrypted"
|
||||
// variant is intentionally absent from ArchiveEntryStatus — a
|
||||
// future byte-level walker would re-add both the variant and the
|
||||
// corresponding render branch.
|
||||
|
||||
it("renders unsupported-entry rows with no chevron", () => {
|
||||
const html = render([
|
||||
makeLeaf("data.bin", {
|
||||
status: "passed-through-unsupported",
|
||||
sourceBytes: null,
|
||||
strippedBytes: null,
|
||||
}),
|
||||
]);
|
||||
expect(html).toContain("zip-expansion__row--passed-through-unsupported");
|
||||
expect(html).toContain("Unsupported — passed through");
|
||||
expect(html).not.toContain("zip-expansion__row--expandable");
|
||||
});
|
||||
|
||||
it("renders directory-entry rows with no chevron", () => {
|
||||
const html = render([
|
||||
makeLeaf("folder/", {
|
||||
status: "directory",
|
||||
sourceBytes: null,
|
||||
strippedBytes: null,
|
||||
}),
|
||||
]);
|
||||
expect(html).toContain("zip-expansion__row--directory");
|
||||
expect(html).toContain("Directory");
|
||||
expect(html).not.toContain("zip-expansion__row--expandable");
|
||||
});
|
||||
});
|
||||
|
||||
describe("ZipExpansion — pagination", () => {
|
||||
it("renders the first 100 entries eagerly with no show-more button when ≤100", () => {
|
||||
const entries = Array.from({ length: 50 }, (_, i) =>
|
||||
makeLeaf(`file-${i}.txt`),
|
||||
);
|
||||
const html = render(entries);
|
||||
expect(html).not.toContain("zip-expansion__show-more");
|
||||
});
|
||||
|
||||
it("emits a 'Show N more entries' button when the archive has > 100 entries", () => {
|
||||
const entries = Array.from({ length: 150 }, (_, i) =>
|
||||
makeLeaf(`file-${i}.txt`),
|
||||
);
|
||||
const html = render(entries);
|
||||
expect(html).toContain("zip-expansion__show-more");
|
||||
// 150 - 100 = 50 remaining; interpolated into the label.
|
||||
expect(html).toContain("Show 50 more entries");
|
||||
// First 100 entries are rendered eagerly; entry index 99 is the
|
||||
// last visible, index 100 is hidden.
|
||||
expect(html).toContain("file-99.txt");
|
||||
expect(html).not.toContain(">file-100.txt<");
|
||||
});
|
||||
});
|
||||
|
||||
describe("ZipExpansion — depth limit", () => {
|
||||
it("renders the depth-limit message instead of rows at depth >= 20", () => {
|
||||
const html = render([makeLeaf("photo.jpg")], 20);
|
||||
expect(html).toContain("zip-expansion__depth-limit");
|
||||
expect(html).toContain(
|
||||
"Depth limit reached — drop the inner file directly",
|
||||
);
|
||||
expect(html).not.toContain("photo.jpg");
|
||||
});
|
||||
|
||||
it("renders entries normally at depth 19", () => {
|
||||
const html = render([makeLeaf("photo.jpg")], 19);
|
||||
expect(html).not.toContain("zip-expansion__depth-limit");
|
||||
expect(html).toContain("photo.jpg");
|
||||
});
|
||||
});
|
||||
|
||||
describe("ZipExpansion — nested ZIP entries", () => {
|
||||
it("renders nested-zip rows with chevron (no diff load on render)", () => {
|
||||
const html = render([
|
||||
makeLeaf("inner.zip", {
|
||||
status: "cleaned",
|
||||
entries: [makeLeaf("inner/photo.jpg")],
|
||||
}),
|
||||
]);
|
||||
expect(html).toContain("inner.zip");
|
||||
// Nested zip rows are expandable.
|
||||
expect(html).toContain("zip-expansion__row--expandable");
|
||||
// The inner row is NOT rendered until the parent is expanded
|
||||
// (state.kind starts as "closed"). In a static render, "inner/photo.jpg"
|
||||
// stays hidden.
|
||||
expect(html).not.toContain("inner/photo.jpg");
|
||||
});
|
||||
});
|
||||
|
||||
describe("ZipExpansion — indent depth", () => {
|
||||
it("caps the indent at level 5 even for deeper trees", () => {
|
||||
const html6 = render([makeLeaf("photo.jpg")], 6);
|
||||
// data-depth attribute caps at INDENT_CAP=5.
|
||||
expect(html6).toContain('data-depth="5"');
|
||||
});
|
||||
|
||||
it("uses depth as data-depth for levels 0..5", () => {
|
||||
expect(render([makeLeaf("a")], 0)).toContain('data-depth="0"');
|
||||
expect(render([makeLeaf("a")], 3)).toContain('data-depth="3"');
|
||||
expect(render([makeLeaf("a")], 5)).toContain('data-depth="5"');
|
||||
});
|
||||
});
|
||||
|
|
@ -162,6 +162,10 @@ describe("processFileEntries", () => {
|
|||
// hook marks pending=true so the row will render a skeleton until
|
||||
// UPDATE_FILE_DIFF lands.
|
||||
diffPending: true,
|
||||
// New strategy-emitted fields default to [] / null when the
|
||||
// mocked WasmApi return omits them.
|
||||
warnings: [],
|
||||
archiveEntries: null,
|
||||
});
|
||||
});
|
||||
|
||||
|
|
|
|||
1340
tools/forensic/zip.ts
Normal file
1340
tools/forensic/zip.ts
Normal file
File diff suppressed because it is too large
Load diff
Loading…
Add table
Reference in a new issue