exifcleaner-web/docs/architecture.md
forgejo_admin 6e52fd894f
All checks were successful
CI / Lint, Typecheck & Unit Tests (push) Successful in 26s
CI / E2E (Standalone single-file) (push) Successful in 1m14s
CI / E2E (Web) (push) Successful in 2m15s
docs(direction): standalone HTML + Android APK are the primary targets; drop iOS (#172)
## Summary

Bring all direction-flavoured docs into sync with the May 2026 state of the project.

**Primary distribution targets are now:**
1. **Desktop offline standalone HTML** (`dist/web-standalone/index.html`) — produced by `yarn build:web:standalone`
2. **Android APK** (Capacitor wrapper) — produced by `.github/workflows/build-android.yml` or `scripts/build-apk-local.sh`

**Demoted to secondary:**
- The deployed web PWA (`dist/web/`) is still buildable and self-hostable via the included Docker + Cloudflare Pages paths, but is no longer the recommended user-facing distribution.

**Out of scope:**
- iOS in any form (App Store, PWA via Safari, Add to Home Screen).

## What changed

| File | Change |
|---|---|
| `.claude/rules/project-direction.md` | Main rewrite: "One code path, two distribution targets"; "Mobile = Android APK, not iOS"; "What's NOT in scope" updated; Phase E.1 issue list recast |
| `.claude/rules/modernization-roadmap.md` | Phase E.1 table: #50 (iOS Photos picker) and #52 (PWA install prompt UX) marked out-of-scope; #23 recast from Web Share Target PWA to Android Intent filter; "PWA is sole channel" claim brought current; key constraints updated |
| `.claude/rules/privacy-invariants.md` | §2 expanded to cover all three distribution paths (standalone inlines everything, APK uses Capacitor's localhost interceptor, self-host PWA caches via service worker) |
| `CLAUDE.md` | Top of file + Tech Stack section reflect dual primary distribution |
| `README.md` | Features list + Project Direction section reflect dual primary, iOS dropped, standalone-on-Android note updated to point at the APK |
| `docs/android-apk.md` | Line 21 note flipped (APK is now primary, not "personal-distribution / not official"); comparison table relabelled; AAR conclusion updated |
| `docs/deploying.md` | Reframed as the self-host PWA doc; iOS Safari install instructions removed; intro note clarifies this is secondary distribution |
| `docs/architecture.md` | History note brought current — mentions APK #156 + standalone HTML as primaries |
| `docs/PRIVACY_GAPS.md` | Android filesystem-isolation note updated to recommend the APK path |
| `docs/standalone-html.md` | "No PWA install" trade-off bullet now points at the Android APK |

10 files changed, 81 insertions, 58 deletions.

## Phase E issues to close as out-of-scope

This direction change makes two open issues out-of-scope. Suggested follow-up — close (not in this PR, since closing is a separate decision):

- **#50** iOS Photos picker UX note — iOS dropped entirely
- **#52** PWA install prompt UX — deployed PWA demoted to self-host only

#23 (Web Share Target PWA) is recast rather than closed — the underlying "let users share files into MetaScrub from the Android gallery" feature is still wanted, but the implementation switches from the Web Share Target API in the manifest to a Capacitor `@capacitor/share` / native Intent filter.

## What I deliberately did NOT touch

- **`CHANGELOG.md`** — historical release notes. The Phase G entry says "PWA is the sole distribution channel" which was accurate at the time; changelogs are snapshots, not living documents.
- **`docs/superpowers/plans/2026-05-14-phase-g-rollout.md` + `docs/superpowers/specs/2026-05-14-phase-g-electron-retirement-design.md`** — historical phase plans/specs describing the work as it was at the time of execution.
- **Playwright e2e mobile-iOS configs** — these test responsive layouts under an iOS Safari user agent, useful coverage independent of iOS being a shipping target. Removing them is a separate test-strategy decision, not a direction-doc concern.
- **Source code** — only two iOS references in `src/`, both in comments describing the mobile-browser landscape generally (not iOS-gated code paths). No code change needed for the direction shift.

## Test plan
- [x] All edits are documentation-only; no source code touched
- [x] No `*.md` linting in CI; prettier-check only targets `src/**/*.{ts,tsx}`
- [ ] Reviewer reads `.claude/rules/project-direction.md` first; satellites mirror its language
- [ ] Decide whether to close #50 + #52 as out-of-scope as a follow-up

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Randa <obuvuyoviz26@gmail.com>
Reviewed-on: http://forgejo.localhost:3000/forgejo_admin/exifcleaner-web/pulls/172
2026-05-21 02:27:41 +04:00

19 KiB
Raw Permalink Blame History

Architecture Guide

A walkthrough of how MetaScrub is wired together, written for a backend developer who's new to React. CLAUDE.md is the reference (LLM-optimised for fast symbol lookup); this document is the narrative — analogies, sequence diagrams, end-to-end traces.

If you've worked on a microservice backend with a DDD-flavoured domain layer, almost everything in this codebase has a familiar analogue. The places where it diverges (React reconciliation, the strategy-registry pattern) are called out explicitly.

History note. This project began life as ExifCleaner, an Electron desktop app wrapping Perl ExifTool. Phase D (2026-05-10) consolidated everything onto one WASM/pure-TS engine that ran in both Electron and web. Phase G (2026-05-14, issue #80) retired the Electron shell entirely. v5 rebranded the fork as MetaScrub to reflect the broader format coverage (PDF, Office, MP4 — not just EXIF). The Capacitor APK (#156, May 2026) added the Android distribution. Today (2026-05-21) the primary distributions are desktop standalone HTML + Android APK; the deployed web PWA is a self-host option, not the recommended user-facing path; iOS is out of scope. This document reflects that state.


Table of contents


Mental model in one page

MetaScrub is a React SPA that strips metadata from files in the user's browser. There is no server, no Electron shell, no native code. Files are read via the File API, processed by hand-rolled FormatStrategy walkers, and handed back to the user via <a download> (or bundled into a zip when the batch contains multiple files / folder structure).

┌─────────────────────────────────────────────────────────────┐
│  Browser tab                                                │
│                                                             │
│   src/web/main.tsx                                          │
│       │                                                     │
│       ├─ window.api = makeWebApi()    (one global surface)  │
│       │                                                     │
│       └─ createRoot(<App />)                                │
│              │                                              │
│              ├─ DropZone / FileBrowseButton                 │
│              │       │                                      │
│              │       └─ File picked → FileRegistry          │
│              │                                              │
│              ├─ FileTable                                   │
│              │       │                                      │
│              │       └─ useProcessFiles()                   │
│              │              │                               │
│              │              └─ window.api.wasm.process()    │
│              │                     │                        │
│              │                     ├─ BrowserFileBytes.read │
│              │                     ├─ selectStrategy()      │
│              │                     ├─ strategy.strip()      │
│              │                     └─ BrowserFileBytes.write│
│              │                            │                 │
│              │                            ├─ batchOutput?   │
│              │                            │   yes → zip     │
│              │                            │   no  → <a download>
│              │                            ▼                 │
│              │                       user's Downloads dir   │
│              │                                              │
│              └─ SettingsDrawer / Toast / ResultPill         │
│                                                             │
└─────────────────────────────────────────────────────────────┘

The boundary that used to be the contextBridge (between Electron's main + preload + renderer) is gone. What used to be an IPC call is now an in-process function call into the strategy registry.


Single-process web app

One JavaScript bundle. One execution context. No IPC, no preload, no main process.

window.api surface

The renderer talks to "infrastructure" through a single window.api object built by makeWebApi():

// src/infrastructure/web/web_api.ts
export interface WebApi {
  i18n: I18nApi;
  files: FilesApi;     // registry registration + batch-notification stubs
  theme: ThemeApi;     // localStorage + prefers-color-scheme
  settings: SettingsApi; // localStorage-backed
  platform: PlatformApi; // { isMac } — used for OS-specific cosmetic touches
  reveal: RevealApi;   // stub that returns "not supported in browser"
  folder: FolderApi;   // classify (no-op) + expand (stub)
  wasm: WasmApi;       // the only one that does real work
}

The surface exists for one reason: the renderer was written against window.api.* when it was talking to Electron's preload, and keeping that shape after Phase G means none of the renderer code needed to change. New code can import infrastructure directly if it's tidier.

Strategy registry

src/infrastructure/wasm/strategy_registry.ts lists every supported format and the strategy that handles it. selectStrategy({ filename, bytes }) performs extension + magic-byte routing and returns the matching FormatStrategy or null. The registry is the canonical source of truth for "what is supported."

Privacy boundary

The privacy boundary is connect-src 'self' enforced by CSP at both build-time (meta tag in vite.config.web.ts) and serve-time (public/_headers + nginx.conf). Combined with the strategy registry running entirely in the browser, no file bytes ever leave the device. See .claude/rules/privacy-invariants.md.


Build pipeline

One config: vite.config.web.ts. One output: dist/web/.

yarn build → vite build --config vite.config.web.ts → dist/web/
                                                       ├─ index.html
                                                       ├─ assets/
                                                       │   ├─ index-<hash>.js   (~440 kB; ~180 kB gzipped)
                                                       │   └─ index-<hash>.css  (~30 kB)
                                                       ├─ sw.js                 (service worker)
                                                       ├─ workbox-<hash>.js
                                                       ├─ manifest.webmanifest
                                                       └─ icons / static assets

yarn build:web:standalone runs a second config (vite.config.web.standalone.ts) that produces dist/web-standalone/index.html — a single file with the whole bundle inlined, for offline hand-off (USB sticks, attachments). See docs/standalone-html.md.

Vite plugins in use

  • @vitejs/plugin-react — Fast Refresh + JSX transform.
  • vite-plugin-pwa — service worker generation, manifest, install prompt.
  • vite-plugin-singlefile — used only by the standalone config.
  • An in-tree plugin that injects the CSP meta tag into index.html.

Dev workflow

yarn dev runs the Vite dev server on :5173 with full HMR. Open the URL in any modern browser; React Fast Refresh applies edits without a reload.


End-to-end trace: dropping a JPEG

You drop photo.jpg onto the empty state. The processing pipeline runs entirely within the browser tab.

  1. DropZone (in src/web/components/ui/) handles ondrop. It pulls the File objects out of the DataTransfer, calls window.api.files.getPathForFile(file) to register each one in the FileRegistry, and dispatches ADD_FILES to the React reducer.
  2. FileTable re-renders with a new FileEntry row in state Pending.
  3. useProcessFiles (the hook subscribed to dispatched ADD_FILES actions) runs processFileEntries():
    • Fetches settings once via window.api.settings.get().
    • For each entry, calls window.api.wasm.process(entry.path, options).
  4. window.api.wasm.process in makeWebApi() delegates to a WasmProcessor instance:
    • BrowserFileBytes.read({ path }) → looks up the File in FileRegistry and returns the bytes.
    • selectStrategy({ filename, bytes }) → routes to JpegStrategy.
    • JpegStrategy.strip({ bytes, options }) → walks markers, drops APP segments, returns cleaned bytes.
    • BrowserFileBytes.write({ path, bytes }):
      • If a batch is active (folder pick / multi-file), appends to the zip accumulator.
      • Otherwise, creates a File with lastModified: 0 (privacy invariant §6), wraps it in a blob URL, and triggers <a download>.
  5. Result propagates back as { ok: true, outputBytes, metadataRemoved }. The hook dispatches UPDATE_FILE_STATUS to Complete. The row re-renders showing the Cleaned pill, before/after sizes, and a reveal icon (a stub in the web build).

No network, no IPC, no main process. The same trace ran inside Electron's renderer pre-Phase-G — the wasm:process IPC call has been collapsed into a direct method call.


DDD layers

The codebase is organised by layer with madge --circular enforcement (yarn check:deps).

domain   ← pure (no I/O, no React, no DOM)
   ▲
application ← ports (interfaces)
   ▲
infrastructure ← adapters (WASM strategies, web file APIs)
   ▲
web ← UI (React)

Domain (src/domain/)

Pure business logic. Zero dependencies on anything outside the domain.

  • files/FileProcessingStatus enum, supported-extension list, generateCleanedPath (legacy helper; no longer wired to save-as-copy since Phase G).
  • i18n/ — locale fallback chain, language metadata.
  • exif/ — error variants + formatExifError.
  • settings_schema.tsSettings type, DEFAULT_SETTINGS, validateSettings, migrateSettings, isSettingsFile.
  • strip_options.ts — the runtime option flags strategies receive.
  • accent_color.ts, path_truncation.ts, etc. — value-object helpers.

Application (src/application/)

Ports — interfaces the infrastructure layer satisfies.

  • ports/file_bytes_port.ts — the abstraction over filesystem reads/writes used by WasmProcessor. Implemented by BrowserFileBytes.
  • ports/metadata_processor_port.ts — the abstraction over "strip a file." Implemented by WasmProcessor.

After Phase G, the application layer is much thinner than it used to be — XattrCommand, ExpandFolderCommand, LoggerPort, SettingsPort were all deleted as the Electron-only consumers went away.

Infrastructure (src/infrastructure/)

Adapters wrapping browser APIs and the format strategies.

  • wasm/ — strategy registry + per-format strategies + WasmProcessor orchestrator.
  • web/FileRegistry, BrowserFileBytes, BatchOutputController, makeWebApi().

Web UI (src/web/)

The React tree. src/web/main.tsx is the entry; everything else under src/web/ is React components, contexts, hooks, BEM CSS. (Prior to consolidation, the React tree lived under src/renderer/ — Electron-process naming that was retired alongside the shell.)


State management in the renderer

Two layers:

  1. Reducer (src/renderer/contexts/AppContext.tsx) — global app state: file list, folder groups, expansion state, selected row. Actions: ADD_FILES, UPDATE_FILE_STATUS, UPDATE_FILE_METADATA, etc.
  2. Component-local useState — small UI state that no other component cares about (toast visibility, accordion expanded flags).

Persistent settings live in localStorage (see loadSettingsFromStorage / saveSettingsToStorage in web_api.ts). Theme follows prefers-color-scheme by default with an override in settings.

The reducer pattern is borrowed straight from Redux but without Redux: useReducer + a Context to thread dispatch down. Components that need to dispatch import useAppContext(); components that need state subscribe to the slice they care about.


React primer for backend devs

If your mental model is "React is a templating system," the productive reframe is:

React turns state into UI as a pure function, then re-runs that function whenever state changes and updates the DOM with whatever's different.

The DOM is a side effect of the state, not the source of truth. You don't write element.style.opacity = 0; you write <div className={isHidden ? "hidden" : ""}>. The runtime ("reconciler") diffs the new tree against the old one and applies the minimum set of DOM operations.

Key concepts in this codebase:

  • Functional components — every component is just a function (props) => JSX. No classes, no lifecycle methods. Side effects go in hooks.
  • useState — local mutable cell; calling its setter re-renders the component.
  • useEffect — escape hatch for imperative side effects (event listeners, timers, subscriptions). Runs after render. The dependency array tells React when to re-run.
  • useReducer — for state that has multiple update paths (more like a state machine).
  • useContext — read a value that some ancestor Provider set. We use it for app state, i18n, theme.
  • useRef — a mutable cell that doesn't trigger re-renders. Used for DOM refs, mutable accumulators across renders.
  • Keys — when rendering a list, give each item a stable key. React uses it to track which item is which across renders.

Common pitfalls:

  • Putting expensive computation in a component body — it re-runs on every render. Memoise with useMemo or move into a useEffect.
  • Stale closures in useEffect / event handlers — if a callback references state, capture the latest value via the dep array or a ref.
  • Mutating state in place (state.files.push(...) doesn't work — React doesn't see the change). Always return new objects from reducers.

Where to add things — recipes

A new format strategy

See .claude/rules/format-strategy-workflow.md for the full analyse → implement → verify flow. The skeletal version:

  1. Write a gap analysis under docs/gap-analysis/<format>.md.
  2. Create src/infrastructure/wasm/strategies/<format>_strategy.ts implementing FormatStrategy.
  3. Register it in src/infrastructure/wasm/strategy_registry.ts.
  4. Tests at tests/infrastructure/wasm/<format>_strategy.test.ts.
  5. Forensic verification under docs/forensic/<format>.md + a runner at tools/forensic/<format>.ts.

A new setting

  1. Add the field to Settings in src/domain/settings_schema.ts (plus DEFAULT_SETTINGS, validateSettings, isSettingsFile).
  2. Add a UI toggle in src/web/components/settings/SettingsDrawer.tsx.
  3. If the setting affects processing, plumb it through StripOptions and the relevant strategy.

A new translation key

Add the key to every locale block in .resources/strings.json. Refer to it from the renderer via the useI18n() hook (t("yourKey")). Locale fallback (regional → base → English) is automatic.

A new e2e test

Add a .spec.ts under tests/e2e/web/. The Playwright config defines projects for desktop, mobile-iOS, mobile-android, and the standalone single-file build. Use the existing specs as templates.


Testing

Two test runners:

  • Vitest (yarn test) — unit tests for domain, application, infrastructure, renderer. Fast (~1s for the whole suite). Tests live under tests/, mirroring the src/ tree.
  • Playwright (yarn test:e2e) — browser e2e. Multiple projects:
    • web-desktop — Chromium against the built dist/web/.
    • web-mobile-ios / web-mobile-android — emulated mobile viewports.
    • standalone — single-file build smoke test.

yarn test:e2e:web runs the desktop + both mobile projects. yarn test:e2e:standalone builds the single-file variant first and then exercises it.

CI runs all of the above in parallel jobs against every PR.


Common gotchas

  • yarn typecheck is the source of truth, not the editor LSP. The LSP regularly reports phantom diagnostics (missing globals, wrong types). Always verify with tsc --noEmit before treating an LSP error as real. See .claude/rules/typescript-conventions.md.
  • Don't mutate React state. Reducers return new objects; setX([...prev, newItem]) not prev.push(newItem).
  • Hooks must run in the same order every render. Don't put a hook inside a conditional or a loop. The react-hooks/rules-of-hooks lint catches most cases.
  • Stable callback references. When a function is in a useEffect dep array or passed to a memoised child, wrap it in useCallback so the dep doesn't churn every render.
  • i18n keys must exist in every locale. Missing keys fall back to English; a missing key in every locale means runtime undefined. Use the dictionary lookup helper, not bracket access.
  • The web build is fully offline. Don't fetch() any external endpoint. CSP connect-src 'self' would block it anyway, but the privacy invariant is the binding rule.
  • System fonts only. No web font downloads, no bundled fonts. See .claude/rules/privacy-invariants.md §5.
  • yarn check:deps is the circular-dependency gate. If it fails, the new edge is structurally wrong — find the right layer for the code instead of suppressing it.

Further reading