Adds FfmpegFallbackStrategy as a peer to ExifToolFallbackStrategy, routing MP4/MOV/M4V (Phase 1) and MKV/WebM (Phase 2) through @ffmpeg/core. On by default for all three distributions (standalone HTML, Capacitor APK, PWA self-host); VITE_ENABLE_FFMPEG_FALLBACK=false opts out. Takes priority over VideoStrategy for the MP4 family; VideoStrategy stays registered as the opt-out fallback until a subsequent PR deletes it. Closes #182. Closes #43. Resolves the four documented walker KNOWN_GAPS categorically: handler-name leak (#38), compressor-name leak (#39), mvhd.next_track_id leak (#111), GPMF/GPS coordinates leak (#42). On gopro-fusion.mp4 (5.1 MB GPMF + tmcd + fdsc) and dji-phantom4.mov (236 MB UserData GPS log) the forensic battery reports zero device-fingerprint survival across every recovery technique. Key architectural choices: - **Main-thread @ffmpeg/core, not @ffmpeg/ffmpeg wrapper.** The wrapper hardcodes type:"module" Workers from Blob URLs, which fail silently under null-origin file:// in Chromium — the standalone build hung forever on every video strip. @ffmpeg/ffmpeg dropped from package.json. - **Stream mapping -map 0 -map -0:d? -map -0:s? -map -0:t?**. Preserves input track order while dropping data/subtitle/timecode streams. Avoids the eng→und reorder bug of -map 0:v?/-map 0:a?, and sidesteps mat2's exit-234 on action-cam files (GoPro Fusion has tmcd/fdsc). - **Post-strip pass rewrites the udta box type to 'free'** (ISO/IEC 14496-12 §8.1.2 padding) to neutralise ffmpeg's hardcoded HandlerType:Metadata + HandlerVendorID:Apple stub. Length-preserving so stco/co64 offsets stay valid. Handles both regular and largesize headers via headerStart+4. - **mdhd.language left as ffmpeg's 'und'** — considered zeroing but reverted: 0x0000 is an invalid ISO 639-2/T code, ffprobe falls back to displaying '(eng)' for invalid codes (actively misleading downstream tools). - **Diff race fix.** @uswriting/exiftool's parseMetadata uses module-level singletons (Perl, MemoryFS, stdout/stderr StringBuilders). WasmProcessor now serializes all diff builds across the processor's lifetime via a Promise chain — guarantees no two parseMetadata calls overlap, whether within an entry or across the fire-and-forget chunk-drained queue. - **ExifTool family-1 group names surfaced verbatim** — IFD0, ExifIFD, XMP-dc, Track1, etc. Refuses to collapse to umbrella labels like 'EXIF' because the collapse caused (source, name) key collisions across sub-groups (Track1:HandlerType vs Track2:HandlerType produced spurious diffs on multi-track MP4). - **Standalone HTML stays single-file.** Two-asset Vite plugin gzip+base64-inlines ffmpeg-core.js + ffmpeg-core.wasm into <script type=text/plain> tags, mirroring the zeroperl pattern. With tree-shaking via __WITH_STANDALONE_INLINE__ the standalone HTML went 116MB → 24MB. Forensic verification: docs/forensic/ffmpeg-fallback.md + tools/forensic/ffmpeg-fallback.ts cover synthetic-mp4/mkv/webm + phone-baseline (2.7MB Android) + gopro-fusion (5MB action-cam) + dji-phantom4 (236MB drone) with zero sentinel/fingerprint survival across the recovery battery. Gap analyses for all three formats at docs/gap-analysis/mp4-ffmpeg.md, mkv.md, webm.md. POC at docs/poc/ffmpeg-wasm.md. Production deps go from 5 → 6: @ffmpeg/core@0.12.10 (GPL-2.0-or-later; combined distributable inherits, MIT codebase unchanged, source pointer in README per GPL compliance). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
14 KiB
MetaScrub
Privacy-focused metadata stripper. Primary distributions: desktop offline standalone HTML + Android APK (Capacitor wrapper). The deployed web PWA still builds and is self-hostable, but it's a secondary target. iOS in any form is out of scope. No Perl runtime, no server-side calls, no Electron shell. MIT license. Forked from szTheory/exifcleaner and rebranded in v5; lineage notes in the README.
Tech Stack
- Runtime: Modern browsers (Chrome, Firefox, Edge) on desktop; Capacitor WebView (Chromium) inside the Android APK. Safari is supported best-effort but not a target (iOS is out of scope).
- Renderer: React 19 SPA built by Vite, with BEM CSS (no framework)
- Language: TypeScript 5.7 with
strict: true+verbatimModuleSyntax: true(type-check only; Vite/esbuild compile) - Build:
vite7.x —vite.config.web.standalone.tsproduces the primary desktop output (dist/web-standalone/index.html, single-file inlined).vite.config.web.tsproducesdist/web/, used as the source for the Android APK (via Capacitorcap sync android) and for the self-host PWA path.dist/web/is not a primary user-facing distribution by itself. - Processing engine: hand-rolled WASM/pure-TS
FormatStrategyimplementations registered insrc/infrastructure/wasm/strategy_registry.ts. The registry is the sole authority for what is supported. - Production deps (6):
@ffmpeg/core(FfmpegFallbackStrategy strip engine for MP4/MOV/MKV/WebM),@uswriting/exiftool(ExifToolFallbackStrategy strip + ExifToolDiffStrategy read engine),jszip(Office),pdf-lib(PDF),react+react-dom(UI). - Performance is sacred: the app should process hundreds of files in seconds. Never add sync I/O in the loop or heavy DOM operations per row.
Commands
# Dev
yarn dev # vite dev server on :5173 (renderer HMR)
# Build
yarn build # vite build → dist/web/
yarn build:web # alias for `yarn build`
yarn build:web:standalone # one-file inlined HTML → dist/web-standalone/index.html
yarn preview # vite preview on :4173
# Quality gates
yarn typecheck # tsc --noEmit
yarn lint # prettier --check
yarn format # prettier --write
yarn test # vitest run
yarn test:watch
yarn test:e2e # All Playwright projects
yarn test:e2e:web # web-desktop + web-mobile-ios + web-mobile-android
yarn test:e2e:web:desktop
yarn test:e2e:standalone # build:web:standalone + Playwright standalone project
yarn check:deps # madge --circular
Architecture
Single-process web app. The whole UI runs in the browser: there is no main process, no preload, no IPC. Format strategies run in-page; output is downloaded via <a download> or bundled into a zip for folder/multi-file batches.
For the narrative tour (analogies for backend devs, sequence diagrams, end-to-end trace, React primer), see docs/architecture.md. The sections below are reference-level.
Web UI (src/web/)
The entire renderer lives under src/web/ — entry, HTML, App shell, components, contexts, hooks, styles, and utils. Prior to the src/renderer/ → src/web/ consolidation, the React tree lived under src/renderer/; that naming was Electron-process terminology and was retired along with the shell.
src/web/main.tsx— entry. Mounts React, setswindow.api = makeWebApi(), imports all CSS files.src/web/index.html— HTML root with relative asset paths.src/web/App.tsx— top-level app shell.src/web/components/— React tree organised by area:file-list/,ui/,icons/,settings/.src/web/contexts/—AppContext,I18nContext,ThemeContext.src/web/hooks/—use_process_files,use_i18n,use_elapsed_time.src/web/styles/— BEM CSS files (one per component area).src/web/utils/—get_file_extension,format_file_size.src/web/env.d.ts— typedwindow.api: WebApiglobal.
Common (src/common/)
Shared helpers (must remain pure for use in the UI).
types.ts—assertNever,getOrThrow.result.ts— theResult<T, E>discriminated-union type.log_error.ts— small console-logging helper.
Domain (src/domain/)
Pure business logic, no I/O, no side effects.
files/{file_status,file_types,cleaned_path,folder_errors}.ts— file-state types, supported-extension list, output-path generator.i18n/{i18n_lookup,language_names}.ts— locale fallback chain + language metadata.exif/exif_errors.ts—formatExifError(used to render WASM errors).settings_schema.ts,settings_errors.ts— settings shape + validation.accent_color.ts,path_truncation.ts,strip_options.ts,window_state_errors.ts— value-object types.
Application (src/application/)
Ports (interfaces) the infrastructure layer satisfies.
ports/{file_bytes_port,metadata_processor_port}.ts— port definitions.
Infrastructure (src/infrastructure/)
Adapters wrapping browser APIs and the format strategies.
wasm/wasm_processor.ts— orchestrator: picks aFormatStrategyfrom the registry, runs strip, writes output.wasm/strategy_registry.ts— the canonical "what we own" list.selectStrategy()does extension + magic-byte routing.wasm/strategies/{jpeg,png,pdf,office,video}_strategy.ts— per-format walkers. JPEG and PNG are hand-rolled marker/chunk walkers; PDF uses pdf-lib; Office uses JSZip; video uses an mp4box-style box-tree rewrite.wasm/format_strategy.ts— the strategy interface.web/file_registry.ts— browser File registry (maps virtual paths toFileobjects).web/browser_file_bytes.ts—FileBytesPortadapter wrapping browser File reads + downloads.web/batch_output.ts— accumulates zip entries for multi-file/folder batches.web/web_api.ts— produces thewindow.apisurface (makeWebApi()plus theWebApiinterface).
Resources (.resources/)
strings.json— i18n dictionary (25 languages, ~30 KB). Bundled into the JS bundle by Vite at build time.icon.png(15 KB),check.png(133 B) — app + checkmark icons.
Directory reference
.resources/ Runtime resources (i18n dictionary + icons; bundled into the JS bundle)
docs/ Project documentation (gap-analysis, forensic, poc, superpowers, deploying, standalone-html, android-apk)
public/ Vite public dir for the web build (manifest, icons, _headers)
static/ Source vector assets (icon.svg)
src/
web/ Entry (main.tsx, index.html) + React component tree, contexts, hooks, BEM CSS, utils, env.d.ts
common/ Shared helpers (Result type, assertNever, log_error)
domain/ Pure value objects + business logic (files, i18n, exif errors, settings, etc.)
application/ Ports (interfaces)
infrastructure/ Adapters (wasm strategies, web adapters)
tests/
domain/, application/, infrastructure/, web/ Vitest unit tests
e2e/web/ Playwright e2e
tools/
forensic/ Reproducible recovery-battery scripts (one per format)
.github/workflows/ CI (ci.yml), web deploy (deploy-web.yml)
Root configs: .prettierrc (tabs), .gitattributes (* text=auto eol=lf), vite.config.web.ts (renderer + PWA + CSP injection), vite.config.web.standalone.ts (single-file build), playwright.config.ts, tsconfig.json (strict, verbatimModuleSyntax, ES2021, bundler resolution), Dockerfile + nginx.conf (web self-host).
Build & Release
Dev workflow
yarn dev runs vite --config vite.config.web.ts on :5173 with full HMR. Open the URL in any modern browser.
Build outputs
dist/web/— index.html + assets/ + sw.js (the deployable PWA; ~440 kB JS gzipped to ~180 kB; ~30 kB CSS).dist/web-standalone/index.html— single-file inlined build for offline hand-off (seedocs/standalone-html.md).
Web deploy
docs/deploying.md covers the deployment paths: Cloudflare Pages (via .github/workflows/deploy-web.yml), self-hosted Docker (Dockerfile + nginx.conf), or static-file hosting behind any web server. CSP connect-src 'self' is enforced in both public/_headers (CF Pages) and nginx.conf (Docker).
Code patterns
- Exports: named exports only — no default exports anywhere.
- Async:
async/awaitthroughout. - File processing:
processFileEntries()insrc/web/hooks/use_process_files.tsruns files sequentially throughwindow.api.wasm.process(...). Settings are fetched once per non-empty batch (not per file). Empty batches are a no-op. - DOM: React functional components + hooks. No direct DOM manipulation outside refs.
- TypeScript:
strict: true+verbatimModuleSyntax: true— strong null checks, no implicit any, type-only imports enforced. - CSS: BEM (
.file-list__row--processingstyle) with custom properties insrc/web/styles/tokens.css. Dark mode viaprefers-color-schememedia query. Pure-CSS animations (no JS animation libs); animations gated behind@media (prefers-reduced-motion: no-preference). - i18n:
.resources/strings.json(~25 languages) bundled by Vite at build time, looked up via the purei18nLookup()fromsrc/domain/i18n/. Locale fallback: regional → base → English. - Errors: throw
Errorobjects (not strings); see.claude/rules/typescript-conventions.mdfor the full Result-type and error-handling pattern. - Forensic verification: any
FormatStrategychange must run a sentinel-based recovery battery — see.claude/rules/format-strategy-workflow.md.
Dependencies
Production (6)
| Package | Purpose |
|---|---|
@ffmpeg/core |
Single-threaded ffmpeg-wasm; the FfmpegFallbackStrategy's strip engine for MP4/MOV/MKV/WebM (#182). GPL-2.0; combined distributable inherits. |
@uswriting/exiftool |
WebPerl-ExifTool wrapper; the ExifToolFallbackStrategy strip engine for WebP/GIF/AVIF + the ExifToolDiffStrategy read engine for the before/after diff feature. |
jszip |
Office archive read/write (DOCX/XLSX/PPTX/ODT) + batch zip output. |
pdf-lib |
PDF metadata stripping. |
react, react-dom |
UI. |
Dev
| Package | Version | Purpose |
|---|---|---|
vite |
^7 | Module bundler / dev server |
vite-plugin-pwa |
^1.3 | PWA manifest + service worker |
vite-plugin-singlefile |
^2.3 | Inline-everything build for the standalone HTML target |
typescript |
~5.7 | Language compiler (strict + verbatimModuleSyntax) |
@types/node |
^22 | Node typings (for build scripts) |
prettier |
^3 | Formatter |
vitest |
3.2.4 | Unit tests |
@playwright/test |
^1.58 | E2E |
madge |
^8 | Circular-dep check |
Code conventions
- Formatting: Prettier with tabs, configured in
.prettierrc. - No JS frameworks at the renderer level beyond React. No animation libraries — pure CSS only.
- Module style: ESM throughout (
"type": "module"). - Naming: snake_case for filenames, camelCase for functions/variables, PascalCase for React components.
- CSS: BEM (mandated for all new CSS).
- Fonts: system stack only (
system-ui, -apple-system, BlinkMacSystemFont, ...). No web font downloads, no bundled fonts. - Dependencies: prefer hand-rolling. Current count is 6 production deps; new deps need explicit justification.
- Error handling: throw
Errorobjects; surface errors viaResult<T, E>shapes (see typescript-conventions.md). - i18n: add translations to
.resources/strings.json. - Performance is sacred: see Tech Stack. Batch operations should feel instant.
Branching
- Integration branch is
master, notmain. Open PRs againstmaster. CI workflows trigger onmaster. The harness's auto-detected "main branch" gitStatus line is wrong for this repo; trust this file. - Worktree / feature branch prefixes in use:
feat/…,fix/…,docs/…,chore/…,worktree-….
Safety rules
- NEVER auto-publish. The web deploy workflow runs on push to
master; that's the only automated path. Do not rungh release createor anything that uploads artifacts. - NEVER enable auto-update / telemetry / analytics. Privacy-conscious users notice — see
.claude/rules/privacy-invariants.md§5.
Security
- CSP via meta tag in
vite.config.web.ts. Strictconnect-src 'self'; no remote origins. The deploy also enforces CSP at the server level viapublic/_headers(CF Pages) andnginx.conf(Docker). - No outbound network traffic in production. No analytics SDK, no error reporting service, no auto-update check, no font/icon CDN. See
.claude/rules/privacy-invariants.md. - Forensic verification scripts use system
exiftool(libimage-exiftool-perlon Debian/Ubuntu,brew install exiftoolon macOS) via the standalone runners undertools/forensic/. - Historical XSS fix in v3.6.0; modern code uses React's auto-escaping.
CI
.github/workflows/ci.yml — runs on PR + push to master:
testjob:yarn lint,yarn typecheck,yarn test,yarn check:depse2e-webjob:yarn test:e2e:web(Chromium + WebKit via Playwright cache)e2e-standalonejob:yarn test:e2e:standalone(single-file build smoke test)
.github/workflows/deploy-web.yml — Cloudflare Pages deploy.
Current state (as of May 2026)
- Phase G shipped 2026-05-14 (#80): Electron shell retired. Deleted
src/main/,src/preload/,src/infrastructure/electron/, the xattr command + service,src/common/{ipc_channels,platform}.ts, electron-vite + electron-builder dependencies and configs, and ~2000 lines of yarn.lock. macOS xattr scrubbing is documented as a gap inPRIVACY_GAPS.md. - Phase D shipped 2026-05-10: WASM-by-default routing, single processing engine across Electron and web.
- Phases A, B, C earlier: Office + fragmented MP4 strategies (A); deployable webapp with Vite + PWA + Docker (B); web e2e coverage + parallel CI (C).
- Active backlog: HEIC strategy (#48) — highest-priority gap (default format on modern Android Pixel/Samsung cameras). PNG/WebP/GIF/BMP/TIFF hand-rolled walkers. Office Phase 2 hardening (#62). Per-format size caps with mobile-friendly messaging (#63).
ExifToolFallbackStrategy+ExifToolDiffStrategydesign phase (docs/poc/webperl-exiftool.md) — opt-in RAW/HEIC/AVIF/long-tail coverage on standalone + APK targets. - See
.claude/rules/modernization-roadmap.mdfor the full phase status anddocs/superpowers/specs/for the migration design + audit specs.