feat(diff): ExifTool-sourced two-pane before/after metadata diff (chunk B) #177
No reviewers
Labels
No labels
bug
documentation
duplicate
e-copy
e-features
e-mobile
enhancement
f-coverage
f-forensic
f-perf
f-privacy
forensic
good first issue
help wanted
infra
invalid
phase-a
phase-b
phase-c
phase-d
phase-e
phase-f
phase-g
phase-h
priority-1
priority-2
priority-3
privacy
question
v5
v6
video-hardening
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: forgejo_admin/exifcleaner-web#177
Loading…
Add table
Reference in a new issue
No description provided.
Delete branch "feat/exiftool-diff-strategy"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Chunk B of #174. Pivots the diff view (issue #22) from per-strategy delta surfaces to a unified ExifTool dump on both source + stripped bytes, displayed in a diffchecker-style two-pane layout. Spec at #176 (
docs/superpowers/specs/2026-05-21-issue-22-diff-pivot-design.md).Same strip behaviour, much richer evidence — ExifTool's
-j -G1 -a -structreturns hundreds of decoded tags per file (MakerNote vendor sub-tags, full XMP schemas, ICC profile details, PDF/Office/MP4 fields) where the existing per-strategy delta walkers only saw what each format's walker was hand-rolled to surface.Refs #22, #174 (chunks C–F still ahead — does not close).
What landed
src/domain/exif/metadata_document.tsMetadataEntry+MetadataDocumenttypes (full snapshot pair).src/infrastructure/wasm/strategies/exiftool_diff_strategy.tsFormatStrategy, not in the registry).parseMetadata({ args: ["-j", "-G1", "-a", "-struct", "-q", "-q"] })+ ExifTool group-name → canonical source-label mapping (IFD0/ExifIFD/InteropIFD/IFD1 → EXIF; XMP-* → XMP; ICC* → ICC; QuickTime/ItemList/UserData/Track* → MP4; etc.). Drops File:/ExifTool:/System: filesystem-level groups.src/infrastructure/wasm/strategies/exiftool_wasm_fetch.tsredirectWasmFetchhelper extracted fromExifToolFallbackStrategyso both strategies use the same?url-import + dynamic-import-inside-typeof-window pattern.src/infrastructure/wasm/wasm_processor.tsparseMetadatacalls in parallel viaPromise.allon source + stripped bytes after each successful strip. Errors fall back todiffDocument: null(UI degrades to legacy delta view). Gated byVITE_ENABLE_EXIFTOOL_DIFF(default on). Fires the one-time loading event on first invocation per session.src/infrastructure/wasm/format_strategy.ts+ each strategyStripResultextended withdiffDocument: MetadataDocument | null. Every strategy returns null; the document is built byWasmProcessor, not the strategy.src/application/ports/metadata_processor_port.ts+src/infrastructure/web/web_api.tsProcessOutcome+WasmApiextended to forwarddiffDocument.src/web/contexts/AppContext.tsx+src/web/hooks/use_process_files.tsFileEntry.diffDocumentadded;UPDATE_FILE_METADATAaction carries it; hook dispatches it.src/web/components/file-list/MetadataDiffExpansion.tsxdiffDocument.before.length > 0, legacy grouped-delta render otherwise. NewcomputeDiffRowsaligns entries on(source, name)and classifies each row as removed / added / modified / kept.src/web/components/file-list/FileRow.tsxisExpandablenow considersdiffDocumenttoo.src/web/styles/file_table.css.file-table__diff--two-pane+.file-table__diff-pair--*classes. Desktop ≥ 720 px: 3-column grid (name | before | after). Mobile < 720 px: stacked with "Before" / "After" pseudo-element labels per cell.src/common/exiftool_diff_events.tsmetascrub:exiftool-diff-loadingevent;FileTablesubscribes and shows a passive toast.vite.config.web.standalone.tsstandaloneWasmInlinePlugin— finds the emittedassets/zeroperl-*.wasm, Base64-encodes it, rewrites the JS reference to adata:application/wasm;base64,…URL, deletes the sibling. Standalone build is now genuinely one file (33.8 MB). PWA / APK builds still get the sibling asset..resources/strings.jsondiffPaneBefore,diffPaneAfter,diffGroupAdded,diffEmptyValue,diffLoadingToast.tests/infrastructure/wasm/exiftool_diff_strategy.test.tstests/web/components/metadata_diff_expansion.test.tsxWhy this is the right shape for the standalone target
Sibling
zeroperl.wasmwould silently break under Chromium'sfile://CORS — Chrome / Edge / Brave block cross-filefetch()fromfile://by default. That's the same restrictionviteSingleFilewas built around in the first place. See spec §8.1 for the full reasoning.Forensic + privacy invariants
Strip behaviour unchanged — diff is a display-side feature. zeroperl.wasm declares zero
sock_*imports in its WASI capability set (verified in #171 POC); the diff reads still satisfy the no-network invariant. The 25 MB WASM is the same artifact #175 already shipped for the fallback strategy.Verification
What this PR deliberately leaves to follow-ups
metadataItemsfromStripResult, the per-strategy enumeration layers (~150 LOC each in JPEG/PNG, smaller in PDF/Office/Video),ifd_value_formatters.ts, and most ofifd_tag_data.ts(52.7 KB). The IFD walker itself stays — it's the strip. Soak window first.-G1 -a -structgroup-mapping coverage gaps — currently catches the common groups (EXIF / GPS / XMP / ICC / PDF / Office / MP4 / Photoshop / IPTC / MakerNotes / JFIF / RIFF). Unknown groups surface verbatim (better than dropping) so they show up in test coverage if new ones appear.Test plan
VITE_ENABLE_EXIFTOOL_DIFF=falseand confirms the legacy delta view still works🤖 Generated with Claude Code
Update: CI green, screenshots, e2e clarification
CI fixes (commit
54efacf→ new HEAD)Two e2e jobs failed on the initial push. Both were downstream of chunk B's eager-diff change:
metadata_diff.spec.ts— asserted on legacy.file-table__diff-row--*classes. The two-pane view uses.file-table__diff-pair--*. Test updated + relaxed the PDF assertion (pdf-lib's re-serialize surfacesmodified/added/keptrows forsample.pdf, noremoved).no-network.spec.ts— caught the same-originzeroperl.wasmasset fetch as an "outbound leak". The privacy invariant is off-origin requests; same-origin asset fetches don't violate it. Test now filters to off-origin before asserting empty.Full e2e locally now:
Did this PR add e2e tests?
Honest answer: not new files, but the existing
tests/e2e/web/metadata_diff.spec.tsis now updated to assert against the two-pane structure. It runs across 5 Playwright projects (web-desktop / web-mobile-ios / web-mobile-android / standalone-desktop / standalone-mobile) × 2 tests = 10 e2e runs of the diff per CI cycle. So the diff view is genuinely covered end-to-end through both the dev-server PWA path and the file:// standalone path. I should have flagged this in the original PR body — the "deferred to chunk B.1" line was wrong.Screenshots
Desktop (1280×800, standalone HTML, sample.jpg dropped, chevron clicked):
Note the source groups (
JFIF · 4 REMOVED,EXIF · 12 REMOVED,GPS · 3 REMOVED,COMPOSITE · 1 REMOVED, 2 KEPT), per-row alignment between the BEFORE and AFTER columns, strike-through values on the left, em-dash placeholders on the right for removed rows, and the passive "Loading metadata reader…" toast at the bottom (this was the first diff click — the toast self-hides after 4 s).Mobile (iPhone 14 emulation, standalone HTML):
Two-pane collapses to a single column at < 720 px viewport with explicit
BEFORE/AFTERlabels per cell. No horizontal scroll on the expansion area (asserted by the e2e). Same toast appears.Capture harness
Added
scripts/capture-diff-screenshots.ts— a one-off Playwright harness that drives the standalone build through the diff flow at desktop + mobile breakpoints. Run on demand:Not hooked into CI (purely a dev tool for review evidence), but committed so future contributors don't have to re-discover the locale-seeding + Playwright wiring.
Self-review (threshold > 50)
Going through the diff with the calibrated threshold from memory. Six findings worth flagging, ordered by score.
🟠 [75] Eager diff blocks the "Cleaned" status signal
WasmProcessor.processsynchronouslyawaitsbuildDiffDocumentbefore returningProcessOutcome(wasm_processor.ts:71-77). The spec at #176 §12.1 explicitly said "user seesCleanedstatus as soon as strip completes; the diff resolves to a skeleton in the expansion area untildiffDocumentarrives". The implementation contradicts this — each file's "Cleaned" appearance is now blocked on twoparseMetadatacalls (~100-300ms each after WASM is warm; the first file in a session adds ~3-5s for WASM instantiation).For a 50-file batch this is a noticeable regression: each file accumulates ~200ms of diff-build latency before showing Complete.
Fix shape: populate
diffDocumentasync via a follow-up dispatch (UPDATE_FILE_DIFFaction), or accept the regression and document it. Either is fine, but the current behavior doesn't match the spec.🟠 [75] Silent-failure mode in the standalone WASM inline plugin
vite.config.web.standalone.ts:197-204tries three URL candidate shapes for substitution. If Vite changes its asset URL convention (or our code switches to animport.meta.urlresolution pattern), all three miss → plugin silently no-ops → ship a broken standalone HTML that fails on first WebP drop with no build-time signal.Fix: assert at least one substitution happened per WASM file. Pseudo:
🟠 [70] Composite group surfaced as "metadata"
The desktop screenshot shows
COMPOSITE · 1 REMOVED, 2 KEPTwithImageSize: 1×1 → 1×1listed as KEPT. ExifTool'sCompositegroup is derived — these tags are computed from the file structure (ImageSize from ImageWidth×ImageHeight), not file-embedded metadata. Showing them in the diff:Fix: drop
CompositefromEXACT_GROUP_MAPand add it toDROP_GROUPSinexiftool_diff_strategy.ts.🟡 [55] WasmProcessor has no test coverage for the diff path
tests/infrastructure/wasm/wasm_processor.test.tswas not updated. The existing 4 tests cover strip + write only; nothing assertsProcessOutcome.diffDocumentis populated. Given thatprocess()now has a meaningful new branch (the parallel ExifTool reads, the warmup-flag dispatch, the null-fallback on error), there should be coverage of:diffDocumentis non-null on a successful strip withENABLE_EXIFTOOL_DIFF=truediffDocumentis null whenVITE_ENABLE_EXIFTOOL_DIFF="false"is setdiffDocumentis null when the underlyingreadDocumentreturns an error (mocked)🟡 [50] Module-level
diffWarmupSignalledflag is testing-hostilewasm_processor.ts:16declares mutable module state with no reset path. Vitest workers persist module state across tests in the same file. If anyone later wants to test the toast-firing behavior:Fix shape: either (a) expose a
__resetForTests()symbol, (b) move the flag to aWasmProcessorinstance field, or (c) accept it and document that the toast is a one-shot per session.🟡 [50]
makeKeyuses space separatorMetadataDiffExpansion.tsx:231-233:Tag names with spaces would collide: source
XMP+ nameCreator Toolcollides with sourceXMP Creator+ nameTool. ExifTool tag names are conventionally camelcase so the risk is vanishingly low in practice, but${source}\0${name}or${source}::${name}is more defensive.Below threshold but worth a glance
1.01 → —REMOVED in the desktop screenshot. Accurate (we drop the JFIF segment), but it's signal-noise — users don't care about JFIFVersion privacy-wise. Future polish: a small allowlist of "structural fields not worth diffing".formatValueJSON-stringifies struct values (exiftool_diff_strategy.ts:171-175). User sees a compact JSON blob with no formatting. Rare in practice for the formats we route, but could be ugly when it appears.extname()/basename()reimplementations inwasm_processor.ts. Could share with existing path helpers if any exist (none I found).TwoPaneViewuses parameter namedocumentwhich shadows the global DOMdocument. Cosmetic.Recommendation
The 3 orange findings (75/75/70) are worth addressing before merge — they're either a spec-violation regression (eager block), a silent-failure landmine (Vite URL substitution), or a visible UX bug (Composite). The two yellow findings (55/50) are test-debt that can land in a follow-up if you want to keep the PR scope tight; the makeKey one is a 5-second defensive change.
Marking the PR as needing-revision pending the three orange items.
All six self-review findings resolved
Three parallel agents landed fixes; here's the round-up.
Commits (on top of the existing PR HEAD)
3afe59fb2543f707085f0Per-finding resolution
#1 [75] Eager diff blocks "Cleaned" signal — fixed.
WasmProcessor.process()now returnsProcessOutcomewithdiffDocument: nullimmediately after write, stashing{sourceBytes, strippedBytes, extension}keyed byfilePathin an instance Map. New methodbuildDiffDocumentForFile({filePath})drains the stash and runs the ExifTool reads out-of-band.use_process_filesdispatchesUPDATE_FILE_METADATA(withdiffPending: true) +UPDATE_FILE_STATUS: Completesynchronously, then fireswasm.buildDiffDocumentfire-and-forget. NewUPDATE_FILE_DIFFreducer action merges the result without touching status.MetadataDiffExpansionshows a wayfinding skeleton (reusingdiffLoadingToasti18n) whilediffPending && !diffDocument. Per-file "Cleaned" status now lands ~100-300ms (warm) or 3-5s (cold) earlier; bulk batches no longer accumulate diff-build latency.#2 [75] Silent-failure mode in standalone WASM inline plugin — fixed. Per-WASM-file
replacedflag; throws with a clear "Vite asset URL format may have changed" message if no candidate matches. Added a build-log line per successful substitution. Caveat the agent flagged: on the current build path the WASM doesn't actually land as a sibling (something upstream already inlines it), so the earlywasmFiles.length === 0return triggers and the fail-loud branch hasn't been exercised yet — but it's correctly scoped inside the per-file loop and will activate the moment Vite ever does emit a sibling whose URL shape we miss.#3 [70] Composite group surfaced as "metadata" — fixed.
Compositeremoved fromEXACT_GROUP_MAPand added toDROP_GROUPSinexiftool_diff_strategy.ts. Updated the existing test to assertsources.has("Composite") === false. Refreshed screenshot below confirms the COMPOSITE row is gone — JFIF / EXIF / GPS are now the only source groups forsample.jpg.#4 [55] No WasmProcessor test coverage for diff path — fixed. 5 new tests in
tests/infrastructure/wasm/wasm_processor.test.ts:diffDocumentviabuildDiffDocumentForFileVITE_ENABLE_EXIFTOOL_DIFF="false"gating verified viavi.stubEnv + vi.resetModulesPlus 4 new reducer tests in
app_context.test.tsforUPDATE_FILE_DIFF+diffPendingmerge semantics.#5 [50] Module-level
diffWarmupSignalledtesting-hostile — fixed. Now a per-instance field with a documented testability rationale. Tests can construct a freshWasmProcessorto start with a clean warm-up flag.#6 [50]
makeKeyspace separator — fixed. Now\0(NUL byte). One small fishhook the agent flagged: a literal NUL had inadvertently flipped the file to git binary mode; the proper\0escape preserves the byte semantics and keeps the file text.Refreshed screenshots (post-fixes)
Desktop — Composite group is gone; JFIF + EXIF + GPS are the only source groups:
Mobile:
Verification
B.1 cleanup tracker filed
#178 captures the combined B.0.5 + B.1 plan: prune the JPEG/PNG/MP4/fallback
MetadataItem[]walker code (ExifTool covers them richer), but keep + integrate Office and PDF walker entries into the diff via a newwalkerEntries: MetadataEntry[]channel onStripResult. The Office walker surfaces ZIP entries / comments / embeddings / customXml that ExifTool's docProps-only read can't see; the PDF walker surfaces annotations / AcroForm fields / embedded files that ExifTool's Info+XMP read can't see. Net-negative LOC overall (~−2400). Soak prereq is landing #177 first.PR ready for re-review.
Both yellow review findings fixed
Pushed
entryId-keyed pending diffs + bounded Map size.[55] Race on
filePathkey — fixed. Stash now keyed byFileEntry.id(the UUID generated in DropZone / FileBrowseButton). Each drop gets its own stash entry; re-drops of the same path can't overwrite each other.[55] Memory leak if hook never drains — fixed.
PENDING_DIFFS_MAX = 50cap with insertion-order eviction on insert. Worst-case retention is 50 × (source + stripped) byte buffers; older entries are silently dropped (the hook layer treats "no pending entry" the same as "ExifTool failed" → legacy delta view fallback, which is correct degradation).API rename:
MetadataProcessorPort.buildDiffDocumentForFile({ filePath })→buildDiffDocumentForEntry({ entryId }).WasmApi.process(entryId, filePath, options),WasmApi.buildDiffDocument(entryId).use_process_filespassesentry.idthrough.Verification (all green):
yarn typecheck/yarn lint/yarn check:depscleanyarn test --run: 549 passedyarn build:web:standalone: 33.8 MB single fileyarn test:e2e:web: 120 passed, 21 skipped (no regressions)Commit: latest HEAD on
feat/exiftool-diff-strategy.