exifcleaner-web/CLAUDE.md
forgejo_admin a5546afa71
All checks were successful
CI / Lint, Typecheck & Unit Tests (push) Successful in 29s
CI / Smoke build (VITE_ENABLE_FFMPEG_FALLBACK=false) (push) Successful in 44s
CI / E2E (Standalone single-file) (push) Successful in 1m33s
CI / E2E (Web) (push) Successful in 3m24s
feat(wasm): FfmpegFallbackStrategy for MP4/MOV/M4V/MKV/WebM (#183)
Adds FfmpegFallbackStrategy as a peer to ExifToolFallbackStrategy, routing MP4/MOV/M4V (Phase 1) and MKV/WebM (Phase 2) through @ffmpeg/core. On by default for all three distributions (standalone HTML, Capacitor APK, PWA self-host); VITE_ENABLE_FFMPEG_FALLBACK=false opts out. Takes priority over VideoStrategy for the MP4 family; VideoStrategy stays registered as the opt-out fallback until a subsequent PR deletes it.

Closes #182. Closes #43.

Resolves the four documented walker KNOWN_GAPS categorically: handler-name leak (#38), compressor-name leak (#39), mvhd.next_track_id leak (#111), GPMF/GPS coordinates leak (#42). On gopro-fusion.mp4 (5.1 MB GPMF + tmcd + fdsc) and dji-phantom4.mov (236 MB UserData GPS log) the forensic battery reports zero device-fingerprint survival across every recovery technique.

Key architectural choices:

- **Main-thread @ffmpeg/core, not @ffmpeg/ffmpeg wrapper.** The wrapper hardcodes type:"module" Workers from Blob URLs, which fail silently under null-origin file:// in Chromium — the standalone build hung forever on every video strip. @ffmpeg/ffmpeg dropped from package.json.
- **Stream mapping -map 0 -map -0:d? -map -0:s? -map -0:t?**. Preserves input track order while dropping data/subtitle/timecode streams. Avoids the eng→und reorder bug of -map 0:v?/-map 0:a?, and sidesteps mat2's exit-234 on action-cam files (GoPro Fusion has tmcd/fdsc).
- **Post-strip pass rewrites the udta box type to 'free'** (ISO/IEC 14496-12 §8.1.2 padding) to neutralise ffmpeg's hardcoded HandlerType:Metadata + HandlerVendorID:Apple stub. Length-preserving so stco/co64 offsets stay valid. Handles both regular and largesize headers via headerStart+4.
- **mdhd.language left as ffmpeg's 'und'** — considered zeroing but reverted: 0x0000 is an invalid ISO 639-2/T code, ffprobe falls back to displaying '(eng)' for invalid codes (actively misleading downstream tools).
- **Diff race fix.** @uswriting/exiftool's parseMetadata uses module-level singletons (Perl, MemoryFS, stdout/stderr StringBuilders). WasmProcessor now serializes all diff builds across the processor's lifetime via a Promise chain — guarantees no two parseMetadata calls overlap, whether within an entry or across the fire-and-forget chunk-drained queue.
- **ExifTool family-1 group names surfaced verbatim** — IFD0, ExifIFD, XMP-dc, Track1, etc. Refuses to collapse to umbrella labels like 'EXIF' because the collapse caused (source, name) key collisions across sub-groups (Track1:HandlerType vs Track2:HandlerType produced spurious diffs on multi-track MP4).
- **Standalone HTML stays single-file.** Two-asset Vite plugin gzip+base64-inlines ffmpeg-core.js + ffmpeg-core.wasm into <script type=text/plain> tags, mirroring the zeroperl pattern. With tree-shaking via __WITH_STANDALONE_INLINE__ the standalone HTML went 116MB → 24MB.

Forensic verification: docs/forensic/ffmpeg-fallback.md + tools/forensic/ffmpeg-fallback.ts cover synthetic-mp4/mkv/webm + phone-baseline (2.7MB Android) + gopro-fusion (5MB action-cam) + dji-phantom4 (236MB drone) with zero sentinel/fingerprint survival across the recovery battery. Gap analyses for all three formats at docs/gap-analysis/mp4-ffmpeg.md, mkv.md, webm.md. POC at docs/poc/ffmpeg-wasm.md.

Production deps go from 5 → 6: @ffmpeg/core@0.12.10 (GPL-2.0-or-later; combined distributable inherits, MIT codebase unchanged, source pointer in README per GPL compliance).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 15:04:04 +04:00

14 KiB

MetaScrub

Privacy-focused metadata stripper. Primary distributions: desktop offline standalone HTML + Android APK (Capacitor wrapper). The deployed web PWA still builds and is self-hostable, but it's a secondary target. iOS in any form is out of scope. No Perl runtime, no server-side calls, no Electron shell. MIT license. Forked from szTheory/exifcleaner and rebranded in v5; lineage notes in the README.

Tech Stack

  • Runtime: Modern browsers (Chrome, Firefox, Edge) on desktop; Capacitor WebView (Chromium) inside the Android APK. Safari is supported best-effort but not a target (iOS is out of scope).
  • Renderer: React 19 SPA built by Vite, with BEM CSS (no framework)
  • Language: TypeScript 5.7 with strict: true + verbatimModuleSyntax: true (type-check only; Vite/esbuild compile)
  • Build: vite 7.x — vite.config.web.standalone.ts produces the primary desktop output (dist/web-standalone/index.html, single-file inlined). vite.config.web.ts produces dist/web/, used as the source for the Android APK (via Capacitor cap sync android) and for the self-host PWA path. dist/web/ is not a primary user-facing distribution by itself.
  • Processing engine: hand-rolled WASM/pure-TS FormatStrategy implementations registered in src/infrastructure/wasm/strategy_registry.ts. The registry is the sole authority for what is supported.
  • Production deps (6): @ffmpeg/core (FfmpegFallbackStrategy strip engine for MP4/MOV/MKV/WebM), @uswriting/exiftool (ExifToolFallbackStrategy strip + ExifToolDiffStrategy read engine), jszip (Office), pdf-lib (PDF), react + react-dom (UI).
  • Performance is sacred: the app should process hundreds of files in seconds. Never add sync I/O in the loop or heavy DOM operations per row.

Commands

# Dev
yarn dev              # vite dev server on :5173 (renderer HMR)

# Build
yarn build              # vite build → dist/web/
yarn build:web          # alias for `yarn build`
yarn build:web:standalone  # one-file inlined HTML → dist/web-standalone/index.html
yarn preview            # vite preview on :4173

# Quality gates
yarn typecheck        # tsc --noEmit
yarn lint             # prettier --check
yarn format           # prettier --write
yarn test             # vitest run
yarn test:watch
yarn test:e2e         # All Playwright projects
yarn test:e2e:web     # web-desktop + web-mobile-ios + web-mobile-android
yarn test:e2e:web:desktop
yarn test:e2e:standalone  # build:web:standalone + Playwright standalone project
yarn check:deps       # madge --circular

Architecture

Single-process web app. The whole UI runs in the browser: there is no main process, no preload, no IPC. Format strategies run in-page; output is downloaded via <a download> or bundled into a zip for folder/multi-file batches.

For the narrative tour (analogies for backend devs, sequence diagrams, end-to-end trace, React primer), see docs/architecture.md. The sections below are reference-level.

Web UI (src/web/)

The entire renderer lives under src/web/ — entry, HTML, App shell, components, contexts, hooks, styles, and utils. Prior to the src/renderer/src/web/ consolidation, the React tree lived under src/renderer/; that naming was Electron-process terminology and was retired along with the shell.

  • src/web/main.tsx — entry. Mounts React, sets window.api = makeWebApi(), imports all CSS files.
  • src/web/index.html — HTML root with relative asset paths.
  • src/web/App.tsx — top-level app shell.
  • src/web/components/ — React tree organised by area: file-list/, ui/, icons/, settings/.
  • src/web/contexts/AppContext, I18nContext, ThemeContext.
  • src/web/hooks/use_process_files, use_i18n, use_elapsed_time.
  • src/web/styles/ — BEM CSS files (one per component area).
  • src/web/utils/get_file_extension, format_file_size.
  • src/web/env.d.ts — typed window.api: WebApi global.

Common (src/common/)

Shared helpers (must remain pure for use in the UI).

  • types.tsassertNever, getOrThrow.
  • result.ts — the Result<T, E> discriminated-union type.
  • log_error.ts — small console-logging helper.

Domain (src/domain/)

Pure business logic, no I/O, no side effects.

  • files/{file_status,file_types,cleaned_path,folder_errors}.ts — file-state types, supported-extension list, output-path generator.
  • i18n/{i18n_lookup,language_names}.ts — locale fallback chain + language metadata.
  • exif/exif_errors.tsformatExifError (used to render WASM errors).
  • settings_schema.ts, settings_errors.ts — settings shape + validation.
  • accent_color.ts, path_truncation.ts, strip_options.ts, window_state_errors.ts — value-object types.

Application (src/application/)

Ports (interfaces) the infrastructure layer satisfies.

  • ports/{file_bytes_port,metadata_processor_port}.ts — port definitions.

Infrastructure (src/infrastructure/)

Adapters wrapping browser APIs and the format strategies.

  • wasm/wasm_processor.ts — orchestrator: picks a FormatStrategy from the registry, runs strip, writes output.
  • wasm/strategy_registry.ts — the canonical "what we own" list. selectStrategy() does extension + magic-byte routing.
  • wasm/strategies/{jpeg,png,pdf,office,video}_strategy.ts — per-format walkers. JPEG and PNG are hand-rolled marker/chunk walkers; PDF uses pdf-lib; Office uses JSZip; video uses an mp4box-style box-tree rewrite.
  • wasm/format_strategy.ts — the strategy interface.
  • web/file_registry.ts — browser File registry (maps virtual paths to File objects).
  • web/browser_file_bytes.tsFileBytesPort adapter wrapping browser File reads + downloads.
  • web/batch_output.ts — accumulates zip entries for multi-file/folder batches.
  • web/web_api.ts — produces the window.api surface (makeWebApi() plus the WebApi interface).

Resources (.resources/)

  • strings.json — i18n dictionary (25 languages, ~30 KB). Bundled into the JS bundle by Vite at build time.
  • icon.png (15 KB), check.png (133 B) — app + checkmark icons.

Directory reference

.resources/             Runtime resources (i18n dictionary + icons; bundled into the JS bundle)
docs/                   Project documentation (gap-analysis, forensic, poc, superpowers, deploying, standalone-html, android-apk)
public/                 Vite public dir for the web build (manifest, icons, _headers)
static/                 Source vector assets (icon.svg)
src/
  web/                  Entry (main.tsx, index.html) + React component tree, contexts, hooks, BEM CSS, utils, env.d.ts
  common/               Shared helpers (Result type, assertNever, log_error)
  domain/               Pure value objects + business logic (files, i18n, exif errors, settings, etc.)
  application/          Ports (interfaces)
  infrastructure/       Adapters (wasm strategies, web adapters)
tests/
  domain/, application/, infrastructure/, web/        Vitest unit tests
  e2e/web/                                            Playwright e2e
tools/
  forensic/             Reproducible recovery-battery scripts (one per format)
.github/workflows/      CI (ci.yml), web deploy (deploy-web.yml)

Root configs: .prettierrc (tabs), .gitattributes (* text=auto eol=lf), vite.config.web.ts (renderer + PWA + CSP injection), vite.config.web.standalone.ts (single-file build), playwright.config.ts, tsconfig.json (strict, verbatimModuleSyntax, ES2021, bundler resolution), Dockerfile + nginx.conf (web self-host).

Build & Release

Dev workflow

yarn dev runs vite --config vite.config.web.ts on :5173 with full HMR. Open the URL in any modern browser.

Build outputs

  • dist/web/ — index.html + assets/ + sw.js (the deployable PWA; ~440 kB JS gzipped to ~180 kB; ~30 kB CSS).
  • dist/web-standalone/index.html — single-file inlined build for offline hand-off (see docs/standalone-html.md).

Web deploy

docs/deploying.md covers the deployment paths: Cloudflare Pages (via .github/workflows/deploy-web.yml), self-hosted Docker (Dockerfile + nginx.conf), or static-file hosting behind any web server. CSP connect-src 'self' is enforced in both public/_headers (CF Pages) and nginx.conf (Docker).

Code patterns

  • Exports: named exports only — no default exports anywhere.
  • Async: async/await throughout.
  • File processing: processFileEntries() in src/web/hooks/use_process_files.ts runs files sequentially through window.api.wasm.process(...). Settings are fetched once per non-empty batch (not per file). Empty batches are a no-op.
  • DOM: React functional components + hooks. No direct DOM manipulation outside refs.
  • TypeScript: strict: true + verbatimModuleSyntax: true — strong null checks, no implicit any, type-only imports enforced.
  • CSS: BEM (.file-list__row--processing style) with custom properties in src/web/styles/tokens.css. Dark mode via prefers-color-scheme media query. Pure-CSS animations (no JS animation libs); animations gated behind @media (prefers-reduced-motion: no-preference).
  • i18n: .resources/strings.json (~25 languages) bundled by Vite at build time, looked up via the pure i18nLookup() from src/domain/i18n/. Locale fallback: regional → base → English.
  • Errors: throw Error objects (not strings); see .claude/rules/typescript-conventions.md for the full Result-type and error-handling pattern.
  • Forensic verification: any FormatStrategy change must run a sentinel-based recovery battery — see .claude/rules/format-strategy-workflow.md.

Dependencies

Production (6)

Package Purpose
@ffmpeg/core Single-threaded ffmpeg-wasm; the FfmpegFallbackStrategy's strip engine for MP4/MOV/MKV/WebM (#182). GPL-2.0; combined distributable inherits.
@uswriting/exiftool WebPerl-ExifTool wrapper; the ExifToolFallbackStrategy strip engine for WebP/GIF/AVIF + the ExifToolDiffStrategy read engine for the before/after diff feature.
jszip Office archive read/write (DOCX/XLSX/PPTX/ODT) + batch zip output.
pdf-lib PDF metadata stripping.
react, react-dom UI.

Dev

Package Version Purpose
vite ^7 Module bundler / dev server
vite-plugin-pwa ^1.3 PWA manifest + service worker
vite-plugin-singlefile ^2.3 Inline-everything build for the standalone HTML target
typescript ~5.7 Language compiler (strict + verbatimModuleSyntax)
@types/node ^22 Node typings (for build scripts)
prettier ^3 Formatter
vitest 3.2.4 Unit tests
@playwright/test ^1.58 E2E
madge ^8 Circular-dep check

Code conventions

  • Formatting: Prettier with tabs, configured in .prettierrc.
  • No JS frameworks at the renderer level beyond React. No animation libraries — pure CSS only.
  • Module style: ESM throughout ("type": "module").
  • Naming: snake_case for filenames, camelCase for functions/variables, PascalCase for React components.
  • CSS: BEM (mandated for all new CSS).
  • Fonts: system stack only (system-ui, -apple-system, BlinkMacSystemFont, ...). No web font downloads, no bundled fonts.
  • Dependencies: prefer hand-rolling. Current count is 6 production deps; new deps need explicit justification.
  • Error handling: throw Error objects; surface errors via Result<T, E> shapes (see typescript-conventions.md).
  • i18n: add translations to .resources/strings.json.
  • Performance is sacred: see Tech Stack. Batch operations should feel instant.

Branching

  • Integration branch is master, not main. Open PRs against master. CI workflows trigger on master. The harness's auto-detected "main branch" gitStatus line is wrong for this repo; trust this file.
  • Worktree / feature branch prefixes in use: feat/…, fix/…, docs/…, chore/…, worktree-….

Safety rules

  • NEVER auto-publish. The web deploy workflow runs on push to master; that's the only automated path. Do not run gh release create or anything that uploads artifacts.
  • NEVER enable auto-update / telemetry / analytics. Privacy-conscious users notice — see .claude/rules/privacy-invariants.md §5.

Security

  • CSP via meta tag in vite.config.web.ts. Strict connect-src 'self'; no remote origins. The deploy also enforces CSP at the server level via public/_headers (CF Pages) and nginx.conf (Docker).
  • No outbound network traffic in production. No analytics SDK, no error reporting service, no auto-update check, no font/icon CDN. See .claude/rules/privacy-invariants.md.
  • Forensic verification scripts use system exiftool (libimage-exiftool-perl on Debian/Ubuntu, brew install exiftool on macOS) via the standalone runners under tools/forensic/.
  • Historical XSS fix in v3.6.0; modern code uses React's auto-escaping.

CI

.github/workflows/ci.yml — runs on PR + push to master:

  • test job: yarn lint, yarn typecheck, yarn test, yarn check:deps
  • e2e-web job: yarn test:e2e:web (Chromium + WebKit via Playwright cache)
  • e2e-standalone job: yarn test:e2e:standalone (single-file build smoke test)

.github/workflows/deploy-web.yml — Cloudflare Pages deploy.

Current state (as of May 2026)

  • Phase G shipped 2026-05-14 (#80): Electron shell retired. Deleted src/main/, src/preload/, src/infrastructure/electron/, the xattr command + service, src/common/{ipc_channels,platform}.ts, electron-vite + electron-builder dependencies and configs, and ~2000 lines of yarn.lock. macOS xattr scrubbing is documented as a gap in PRIVACY_GAPS.md.
  • Phase D shipped 2026-05-10: WASM-by-default routing, single processing engine across Electron and web.
  • Phases A, B, C earlier: Office + fragmented MP4 strategies (A); deployable webapp with Vite + PWA + Docker (B); web e2e coverage + parallel CI (C).
  • Active backlog: HEIC strategy (#48) — highest-priority gap (default format on modern Android Pixel/Samsung cameras). PNG/WebP/GIF/BMP/TIFF hand-rolled walkers. Office Phase 2 hardening (#62). Per-format size caps with mobile-friendly messaging (#63). ExifToolFallbackStrategy + ExifToolDiffStrategy design phase (docs/poc/webperl-exiftool.md) — opt-in RAW/HEIC/AVIF/long-tail coverage on standalone + APK targets.
  • See .claude/rules/modernization-roadmap.md for the full phase status and docs/superpowers/specs/ for the migration design + audit specs.