Migrated from github.com/obuvuyoviz26-lab/exifcleaner-web
Find a file
Randa 1b0896f264
All checks were successful
CI / Lint, Typecheck & Unit Tests (push) Successful in 52s
CI / Smoke build (VITE_ENABLE_FFMPEG_FALLBACK=false) (push) Successful in 1m10s
CI / E2E (Standalone single-file) (push) Successful in 1m55s
CI / E2E (Web) (push) Successful in 4m1s
docs(security): mark CSP + deps remediation items resolved (PR #196, #197)
Updates the 2026-05-22 audit to reflect the three shipped fixes:
- #2 yarn audit CI gate (PR #196)
- #3 devDependency vulnerabilities 42 → 0 (PR #196)
- #4 style-src 'unsafe-inline' removed from all three CSP layers (PR #197)

New score: 72/82 (was 66/82). Sole remaining HIGH: .env keystore password.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-23 00:09:07 +04:00
.claude docs(direction): standalone HTML + Android APK are the primary targets; drop iOS (#172) 2026-05-21 02:27:41 +04:00
.github/workflows security: upgrade deps + add audit CI gate (#191 #192) (#196) 2026-05-22 22:48:43 +04:00
.resources feat(diff): copy to clipboard markdown (#187) (#195) 2026-05-22 23:54:59 +04:00
android fix(android): deliver cleaned files on Android APK (#186) (#189) 2026-05-22 22:19:11 +04:00
docker/android-builder feat(android-local): vendored Docker build for APK with no host SDK (#163) 2026-05-20 19:10:27 +04:00
docs feat(diff): copy to clipboard markdown (#187) (#195) 2026-05-22 23:54:59 +04:00
public security: remove style-src 'unsafe-inline' from all CSP policies (#197) 2026-05-23 00:06:28 +04:00
scripts feat(zip): generic ZIP support with recursive inner-file cleaning (#184) (#188) 2026-05-22 20:32:03 +04:00
src security: remove style-src 'unsafe-inline' from all CSP policies (#197) 2026-05-23 00:06:28 +04:00
static docs(readme): web-first framing + standalone HTML build path (#90) 2026-05-14 07:06:19 +04:00
test/fixtures ExifCleaner v4.0: Full modernization (#270) 2026-03-31 12:43:34 -04:00
tests feat(diff): copy to clipboard markdown (#187) (#195) 2026-05-22 23:54:59 +04:00
tools/forensic feat(zip): generic ZIP support with recursive inner-file cleaning (#184) (#188) 2026-05-22 20:32:03 +04:00
.env.sample feat(android): assembleRelease + env-var signing config (#165) (#185) 2026-05-22 12:14:01 +04:00
.gitattributes generator-electron 2019-12-05 00:07:43 -05:00
.gitignore feat(android): assembleRelease + env-var signing config (#165) (#185) 2026-05-22 12:14:01 +04:00
.npmrc ExifCleaner v4.0: Full modernization (#270) 2026-03-31 12:43:34 -04:00
.prettierrc fix: Made Prettier use tabs instead of spaces to format files (#177) 2022-01-24 20:01:25 +00:00
capacitor.config.ts feat(android): Capacitor APK wrapper + on-demand CI build (#156) 2026-05-17 16:15:22 +04:00
CHANGELOG.md Rebrand to MetaScrub + upstream attribution (#95) 2026-05-14 10:39:19 +04:00
CLAUDE.md feat(wasm): FfmpegFallbackStrategy for MP4/MOV/M4V/MKV/WebM (#183) 2026-05-22 15:04:04 +04:00
Dockerfile B: Deployable webapp — Vite build, web adapters, JPEG/PDF strategies, PWA, Docker, CI 2026-05-07 17:56:50 +04:00
LICENSE upgrade prettier dev dep to 2.0.5 2020-07-11 09:38:11 -04:00
metascrub-security-audit.md docs(security): mark CSP + deps remediation items resolved (PR #196, #197) 2026-05-23 00:09:07 +04:00
nginx.conf security: remove style-src 'unsafe-inline' from all CSP policies (#197) 2026-05-23 00:06:28 +04:00
package.json security: upgrade deps + add audit CI gate (#191 #192) (#196) 2026-05-22 22:48:43 +04:00
playwright.config.ts feat(diff): detailed before/after metadata diff — surface + JPEG (#22) (#119) 2026-05-15 15:52:14 +04:00
README.md feat(zip): generic ZIP support with recursive inner-file cleaning (#184) (#188) 2026-05-22 20:32:03 +04:00
SIGNING-SETUP.md ExifCleaner v4.0: Full modernization (#270) 2026-03-31 12:43:34 -04:00
tsconfig.json v4.0.0 release: v4.1 quality polish + release prep (#285) 2026-04-02 17:08:17 -04:00
vite.config.web.standalone.ts feat(wasm): FfmpegFallbackStrategy for MP4/MOV/M4V/MKV/WebM (#183) 2026-05-22 15:04:04 +04:00
vite.config.web.ts security: remove style-src 'unsafe-inline' from all CSP policies (#197) 2026-05-23 00:06:28 +04:00
vitest.config.ts feat(diff): detailed before/after metadata diff — surface + JPEG (#22) (#119) 2026-05-15 15:52:14 +04:00
yarn.lock security: upgrade deps + add audit CI gate (#191 #192) (#196) 2026-05-22 22:48:43 +04:00

MetaScrub

Strip metadata from images, videos, PDFs, and Office documents. Runs entirely in your browser — no uploads, no server.

Forked from szTheory/exifcleaner (MIT). Substantially rewritten across v4 (modernization) and v5 (Phase AG: WASM strategy registry, web-only build, Electron retirement). Rebranded from ExifCleaner to MetaScrub in v5 to reflect coverage beyond EXIF (PDF, Office, MP4). See Credits.

Features

  • Two ways to run, both fully offline: open the single self-contained HTML file in any desktop browser, or install the Android APK (sideload — no Play Store account, no Internet required at install or runtime)
  • Drag and drop files or folders; recursive folder processing
  • Fully offline — files never leave your device
  • Privacy controls: preserve orientation, preserve color profile, timestamps written to epoch
  • 25 translated locales with in-app language switching (contributions welcome in .resources/strings.json)
  • Dark mode (follows OS preference)
  • Free and open source (MIT)
  • No automatic updates, no telemetry, no phone-home

See the CHANGELOG for release history.

Project Direction

MetaScrub ships two binaries from one codebase: a standalone offline HTML file (dist/web-standalone/index.html) for desktop use, and an Android APK (Capacitor wrapper) for offline Android use. Both consume the same WASM and pure-TypeScript format strategies registered in src/infrastructure/wasm/strategy_registry.ts — identical processing path on both targets.

Phase D (shipped 2026-05-10) collapsed the previous two-engine architecture onto one strategy registry; Phase G (shipped 2026-05-14, issue #80) retired the Electron shell; the Capacitor APK (shipped May 2026, #156) replaced "install the PWA on Android" as the mobile path. The deployed web PWA produced by yarn build:web is still self-hostable via Docker / Cloudflare Pages, but is no longer a primary distribution target. iOS in any form is out of scope.

Hand-rolled pure-TypeScript marker and chunk walkers cover documented containers (JPEG, PNG today, WebP/GIF/BMP/TIFF in flight). For ISOBMFF-based formats (HEIC, AVIF, MP4), the existing video-strategy box walker provides a foundation; a targeted Rust→WASM module is the second-line option only if a hand-rolled approach proves insufficient. Library evaluations under docs/poc/ showed the hand-rolled approach is smaller, more transparent, and more thorough than the maintained alternatives we tried (little_exif and exiv2-wasm both leave significant metadata behind on JPEG/PNG).

Server-side processing is explicitly out of scope. Uploading user files to a server, even as a "last resort fallback", would invalidate the privacy guarantee that defines this app — and "last resort" tends to drift to "default". Per-format size caps with explicit messaging (issue #63) cover the large-file edge case without ever reaching for a remote endpoint.

RAW formats are unsupported from v5 forward. ExifTool's RAW support represents roughly two decades of reverse-engineering on proprietary containers (CR2, CR3, NEF, ARW, RAF, ORF, DNG, and dozens of vendor variants), and no production-ready WASM library covers that surface. RAW workflows belong on dedicated tools — see docs/PRIVACY_GAPS.md for context and alternatives.

Format Support Matrix

A fast lookup of where each format stands in v5. The previous v3.6 desktop release (last shipped 2021) used a bundled Perl ExifTool that supported additional formats, including RAW; v5 drops that.

For "what's partially cleaned even when supported", see docs/PRIVACY_GAPS.md.

Format v5 Coverage
JPG, JPEG Full¹ (hand-rolled walker)
PNG Full¹ (hand-rolled walker)
GIF, WebP, BMP, TIFF Unsupported² (hand-rolled walkers in flight)
HEIC, HEIF Unsupported² (issue #48 in flight — highest-priority deferred format)
AVIF Unsupported²
PDF Best-effort³
DOCX, XLSX, PPTX, ODT Partial⁴ (WASM strategy)
MP4, MOV, M4V, 3GP, 3G2 Partial⁵ (WASM strategy)
ZIP Full⁷ (recursive inner-file cleaning)
MKV Unsupported (issue #43, deferred to v6)
RAW (CR2/CR3/NEF/ARW/RAF/ORF/DNG/...) Unsupported⁶
SVG, JXL, JPEG 2000, AVI Unsupported

Footnotes:

  1. JPEG and PNG: hand-rolled walkers. JPEG drops APP0APP15 (except APP14 Adobe DCT) and the COM marker; PNG drops tEXt/zTXt/iTXt/eXIf chunks and other metadata-bearing ancillary chunks. Both preserve image data verbatim and mirror ExifTool's -all= policy. Per-format tables: docs/gap-analysis/jpeg.md, docs/gap-analysis/png.md. Forensic verification: docs/forensic/jpeg.md, docs/forensic/png.md.
  2. Formats listed as Unsupported fall through with an explicit "unsupported" error in the UI. Hand-rolled marker/chunk walkers are the planned path; see docs/poc/ for the investigations that ruled out WASM library alternatives.
  3. PDF: the strategy clears the Info dictionary (Title, Author, Subject, Keywords, Producer, Creator, CreationDate, ModDate), drops the catalog /Metadata XMP stream and its indirect object, scrubs annotation author/comment/timestamp keys, and removes catalog-level fingerprints (/Lang, /PageLabels, /OutputIntents) plus per-page /Metadata and /Thumb. Embedded files and AcroForm data are not touched (they may carry legitimate document content). The strip is structurally cleaner than ExifTool's PDF behaviour, which uses incremental updates and leaves the original metadata recoverable in the file body. Full analysis: docs/gap-analysis/pdf.md. Forensic verification: docs/forensic/pdf.md.
  4. Office: clears docProps/{core,app,custom}.xml and a thumbnail. Known partial coverage of tracked changes/comments, RSIDs, embedded media EXIF, customXml/ parts, and file paths in *.rels — tracked under issue #62 (Office Phase 2 hardening). See docs/PRIVACY_GAPS.md for the user-facing summary.
  5. MP4/MOV: drops udta, meta, and Xtra containers via mp4box.js box-tree rewrite (no re-encoding, lossless). Known gaps in timed-metadata tracks, hdlr names, compressorname, mdat orphans, and sidecar files — see docs/PRIVACY_GAPS.md for the user-facing summary.
  6. RAW: removed in v5 (decided 2026-05-09, shipped 2026-05-10). No production-ready WASM library covers proprietary RAW. RAW workflows should use ExifTool standalone or a dedicated RAW tool — see docs/PRIVACY_GAPS.md#raw-unsupported.
  7. ZIP: per-entry timestamps normalized to DOS epoch (1980-01-01); per-entry comments and extra fields scrubbed; archive comment scrubbed. Each supported inner file is re-dispatched through selectStrategy() and cleaned with its native walker (JPEG/PNG/PDF/Office/MP4/etc.); nested .zip entries recurse. UI shows a per-entry tree with lazy on-expand diff loads. Encrypted archives are refused with a clear message directing users to a decryption-capable tool (see docs/PRIVACY_GAPS.md). Full analysis: docs/gap-analysis/zip.md. Forensic verification: docs/forensic/zip.md.

Running the web app locally

MetaScrub runs entirely in your browser — no server-side processing, no file uploads.

Option 1: Single-file HTML (no server, no install)

yarn build:web:standalone

Produces dist/web-standalone/index.html — one self-contained ~830 KB HTML file with the entire app inlined (JS, CSS, dependencies). Open it directly in any modern desktop browser from disk; no server, no install, no internet. Email it, share it on a USB stick — recipients double-click it and it runs.

This is a desktop deliverable (Chrome, Brave, Edge, Firefox). Chrome on Android doesn't support opening local HTML files — for mobile, use the Android APK instead. Trade-offs (no service worker, no lazy-loading, no PWA install) and the full rationale live in docs/standalone-html.md.

Option 2: Docker

# Build the image
docker build -t metascrub .

# Run on http://localhost:8080
docker run -p 8080:80 metascrub

Open http://localhost:8080. Drag and drop files to clean metadata.

Option 3: Node dev server

Requires Node 22 and yarn:

yarn install
yarn dev

Open http://localhost:5173.

Option 4: Build and preview

yarn build
yarn preview

Open http://localhost:4173. This serves the same optimised bundle as production.

Documentation

The docs/ tree is organized around an analyse → implement → verify pattern. New strategies for non-trivial formats are expected to follow it.

  • docs/architecture.md — narrative architecture guide for new contributors: the build pipeline, an end-to-end trace of a file drop, the DDD layers, state management in the renderer, and a React primer aimed at backend devs. Start here if the codebase is new to you.
  • docs/gap-analysis/ — per-format coverage analysis written before implementation. Each writeup compares current state vs reference implementations (typically ExifTool) vs what's theoretically possible, and locks in the marker/chunk policy. Currently: jpeg.md, pdf.md, png.md.
  • docs/poc/ — library evaluation writeups for approaches considered and ruled out, with bundle sizes and coverage tables. Currently: little-exif-wasm.md, exiv2-wasm.md.
  • docs/forensic/ — adversarial recovery tests run after implementation lands, with reproducible runners under tools/forensic/. Tests embed sentinel strings, strip multiple ways, and compare survivors across recovery techniques. Currently: jpeg.md, pdf.md, png.md. Office and Video forensic writeups are in flight (issues #64, #65).
  • docs/PRIVACY_GAPS.md — the inverse of forensic/: known cases where the privacy guarantee bends (RAW unsupported, MP4 timed-metadata tracks, sidecar files, etc.). Required reading for anyone touching format strategies.
  • docs/deploying.md — deployment guide for the web app: Cloudflare Pages, self-hosted Docker (with Cloudflare Tunnel, VPS + nginx/Caddy, or Tailscale Funnel), and PWA install on Android/iOS.

Development

Built with React 19 and TypeScript 5.7 (strict mode), bundled by Vite 7. The WASM strategy registry under src/infrastructure/wasm/ is the single processing engine. No bundled binaries, no Perl runtime, no Electron shell.

Run the app in dev mode

yarn install
yarn dev

Open http://localhost:5173. Vite's dev server gives full HMR; edits to the React tree appear instantly without a reload.

To run the browser build standalone or behind a server, see Running the web app locally.

Running tests

yarn test          # Unit tests (Vitest, ~1s)
yarn test:e2e      # E2E tests (Playwright, ~30s)
yarn lint          # Prettier formatting check
yarn typecheck     # TypeScript strict mode check

Contributing a Format Strategy

A FormatStrategy is a pure function that takes file bytes and returns cleaned bytes for one or more file extensions. Strategies are the single processing pipeline MetaScrub ships, with no Perl ExifTool dependency.

The interface lives at src/infrastructure/wasm/format_strategy.ts:

export interface FormatStrategy {
	/**
	 * Returns the lowercase set of file extensions this strategy handles
	 * (each starting with a dot, e.g. ".docx").
	 */
	readonly extensions: ReadonlySet<string>;

	/**
	 * Optional magic-byte check to confirm the file content matches the
	 * declared extension. Returns true if confirmed, false to decline.
	 * If absent, extension match alone is sufficient.
	 */
	readonly verifyMagicBytes?: (args: { bytes: Uint8Array }) => boolean;

	/**
	 * Strips metadata from the file bytes and returns the cleaned bytes.
	 * Pure function — no I/O, no globals.
	 */
	strip(args: {
		bytes: Uint8Array;
		options: StripOptions;
	}): Promise<Result<StripResult, ExifError>>;
}

To add a new strategy:

  1. Create src/infrastructure/wasm/strategies/<name>_strategy.ts and implement FormatStrategy. Keep it pure — accept bytes, return bytes.
  2. Register it in src/infrastructure/wasm/strategy_registry.ts by adding an instance to the STRATEGIES array. The registry is the sole authority for what the renderer routes through WASM.
  3. Add tests at tests/infrastructure/wasm/<name>_strategy.test.ts — see the existing jpeg_strategy.test.ts, pdf_strategy.test.ts, office_strategy.test.ts, and video_strategy.test.ts for the established patterns (round-trip fixtures, magic-byte rejection, malformed input handling).

For broader context on the analysis-then-implementation pattern, the existing strategies all have writeups under docs/gap-analysis/ (current state vs reference vs theoretical) and forensic recovery tests under docs/forensic/. New strategies for non-trivial formats are expected to follow the same pattern.

Deploying

See docs/deploying.md for the deployment paths (Cloudflare Pages, Docker, static hosting).

Credits

MetaScrub began as ExifCleaner by szTheory and contributors — first released in 2019 as an Electron desktop wrapper around ExifTool, with translations and platform support contributed by the community over five years. v3.6.0 (May 2021) was the last upstream release before this fork.

The codebase has been substantially rewritten since:

  • v4 (20252026): Electron 11 → 35, vanilla DOM → React 19, build system from electron-webpack → electron-vite, DDD layering introduced, zero tests → 300+ unit + e2e tests.
  • v5 Phases AC: hand-rolled WASM/pure-TS strategy registry for JPEG, PNG, PDF, Office, MP4 — replaces the bundled Perl ExifTool. Deployable web build + PWA + Docker.
  • v5 Phase D (2026-05-10): single processing engine across Electron and web.
  • v5 Phase G (2026-05-14): Electron retired entirely. Project rebranded ExifCleaner → MetaScrub to reflect coverage beyond EXIF.

All upstream contributors are credited in the original ExifCleaner README. MIT license preserved throughout.

Third-party engines and license notices

MetaScrub bundles two upstream WebAssembly engines as build-time-opt-in fallback strategies. Both default to on for the standalone HTML and Android APK distributions; set the corresponding env var to false at build time to omit the engine.

Engine Used for Build flag (env) License Source
ffmpeg-wasm (@ffmpeg/core) MP4 / MOV / M4V / MKV / WebM strip via FfmpegFallbackStrategy (#182) VITE_ENABLE_FFMPEG_FALLBACK @ffmpeg/core: GPL-2.0-or-later (the WASM build includes GPL components from upstream ffmpeg). Loaded directly on the main thread — no @ffmpeg/ffmpeg wrapper. https://github.com/ffmpegwasm/ffmpeg.wasm
WebPerl ExifTool (@uswriting/exiftool + @6over3/zeroperl-ts) WebP / GIF / AVIF strip via ExifToolFallbackStrategy (#174); diff via ExifToolDiffStrategy (#177) VITE_ENABLE_EXIFTOOL_FALLBACK Apache-2.0 (wrappers). ExifTool itself is GPL or Artistic. https://github.com/6over3/zeroperl-ts · https://exiftool.org/

GPL-2.0 implication for ffmpeg: distributions of MetaScrub that include ffmpeg-core.wasm (the default for standalone HTML + APK builds) are subject to GPL-2.0 for the combined work. Our codebase remains MIT (no GPL source is copied into our source tree), but the combined binary distribution must comply with GPL-2.0's source-availability requirement. That requirement is met by linking to https://github.com/ffmpegwasm/ffmpeg.wasm — the upstream is fully open and we pin specific versions in package.json (recoverable from git log plus the lockfile).

Builds with VITE_ENABLE_FFMPEG_FALLBACK=false omit the ffmpeg engine from the strategy chain (VideoStrategy handles MP4/MOV/M4V; MKV/WebM become unsupported). The combined binary in that mode contains no GPL-licensed code.