exifcleaner-web/vite.config.web.standalone.ts
forgejo_admin a5546afa71
All checks were successful
CI / Lint, Typecheck & Unit Tests (push) Successful in 29s
CI / Smoke build (VITE_ENABLE_FFMPEG_FALLBACK=false) (push) Successful in 44s
CI / E2E (Standalone single-file) (push) Successful in 1m33s
CI / E2E (Web) (push) Successful in 3m24s
feat(wasm): FfmpegFallbackStrategy for MP4/MOV/M4V/MKV/WebM (#183)
Adds FfmpegFallbackStrategy as a peer to ExifToolFallbackStrategy, routing MP4/MOV/M4V (Phase 1) and MKV/WebM (Phase 2) through @ffmpeg/core. On by default for all three distributions (standalone HTML, Capacitor APK, PWA self-host); VITE_ENABLE_FFMPEG_FALLBACK=false opts out. Takes priority over VideoStrategy for the MP4 family; VideoStrategy stays registered as the opt-out fallback until a subsequent PR deletes it.

Closes #182. Closes #43.

Resolves the four documented walker KNOWN_GAPS categorically: handler-name leak (#38), compressor-name leak (#39), mvhd.next_track_id leak (#111), GPMF/GPS coordinates leak (#42). On gopro-fusion.mp4 (5.1 MB GPMF + tmcd + fdsc) and dji-phantom4.mov (236 MB UserData GPS log) the forensic battery reports zero device-fingerprint survival across every recovery technique.

Key architectural choices:

- **Main-thread @ffmpeg/core, not @ffmpeg/ffmpeg wrapper.** The wrapper hardcodes type:"module" Workers from Blob URLs, which fail silently under null-origin file:// in Chromium — the standalone build hung forever on every video strip. @ffmpeg/ffmpeg dropped from package.json.
- **Stream mapping -map 0 -map -0:d? -map -0:s? -map -0:t?**. Preserves input track order while dropping data/subtitle/timecode streams. Avoids the eng→und reorder bug of -map 0:v?/-map 0:a?, and sidesteps mat2's exit-234 on action-cam files (GoPro Fusion has tmcd/fdsc).
- **Post-strip pass rewrites the udta box type to 'free'** (ISO/IEC 14496-12 §8.1.2 padding) to neutralise ffmpeg's hardcoded HandlerType:Metadata + HandlerVendorID:Apple stub. Length-preserving so stco/co64 offsets stay valid. Handles both regular and largesize headers via headerStart+4.
- **mdhd.language left as ffmpeg's 'und'** — considered zeroing but reverted: 0x0000 is an invalid ISO 639-2/T code, ffprobe falls back to displaying '(eng)' for invalid codes (actively misleading downstream tools).
- **Diff race fix.** @uswriting/exiftool's parseMetadata uses module-level singletons (Perl, MemoryFS, stdout/stderr StringBuilders). WasmProcessor now serializes all diff builds across the processor's lifetime via a Promise chain — guarantees no two parseMetadata calls overlap, whether within an entry or across the fire-and-forget chunk-drained queue.
- **ExifTool family-1 group names surfaced verbatim** — IFD0, ExifIFD, XMP-dc, Track1, etc. Refuses to collapse to umbrella labels like 'EXIF' because the collapse caused (source, name) key collisions across sub-groups (Track1:HandlerType vs Track2:HandlerType produced spurious diffs on multi-track MP4).
- **Standalone HTML stays single-file.** Two-asset Vite plugin gzip+base64-inlines ffmpeg-core.js + ffmpeg-core.wasm into <script type=text/plain> tags, mirroring the zeroperl pattern. With tree-shaking via __WITH_STANDALONE_INLINE__ the standalone HTML went 116MB → 24MB.

Forensic verification: docs/forensic/ffmpeg-fallback.md + tools/forensic/ffmpeg-fallback.ts cover synthetic-mp4/mkv/webm + phone-baseline (2.7MB Android) + gopro-fusion (5MB action-cam) + dji-phantom4 (236MB drone) with zero sentinel/fingerprint survival across the recovery battery. Gap analyses for all three formats at docs/gap-analysis/mp4-ffmpeg.md, mkv.md, webm.md. POC at docs/poc/ffmpeg-wasm.md.

Production deps go from 5 → 6: @ffmpeg/core@0.12.10 (GPL-2.0-or-later; combined distributable inherits, MIT codebase unchanged, source pointer in README per GPL compliance).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 15:04:04 +04:00

404 lines
17 KiB
TypeScript
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

import { defineConfig } from "vite";
import react from "@vitejs/plugin-react";
import { viteSingleFile } from "vite-plugin-singlefile";
import {
readFileSync,
writeFileSync,
readdirSync,
rmdirSync,
existsSync,
} from "node:fs";
import { gzipSync } from "node:zlib";
import { resolve } from "node:path";
import type { Plugin } from "vite";
type ManifestIcon = {
src: string;
sizes: string;
type: string;
purpose?: string;
};
type Manifest = {
icons: ManifestIcon[];
[key: string]: unknown;
};
// The standalone build is meant to be opened directly via `file://`. Two
// things in `vite.config.web.ts` make that impossible and need to be undone
// here:
//
// 1. CSP meta tag — `script-src 'self'` doesn't match anything useful under
// a `null` origin (file://), and the singlefile inlining also requires
// 'unsafe-inline' anyway. We just drop the tag; there are no remote
// origins in a self-contained bundle for it to gate.
//
// 2. ES module loading — Chromium browsers (Chrome, Brave, Edge) refuse to
// load `<script type="module" src="./...">` from a file:// origin with a
// CORS error ("Cross origin requests are only supported for protocol
// schemes: http, https, …"). vite-plugin-singlefile fixes this by
// inlining the entire bundle into a single inline `<script>` (the script
// stays `type="module"` — see the block below for why), which CORS
// doesn't apply to because there's no fetch.
//
// Manifest + service worker + favicon link are also stripped because they
// reference external files that fail under file:// (manifest + favicon hit
// the same CORS rule; service workers reject the `null` origin).
//
// The inlined script MUST stay `type="module"` — the bundle uses
// `import.meta` references that only resolve in module context (classic
// scripts throw "Cannot use 'import.meta' outside a module"). Module scripts
// are only subject to CORS for external fetches via `src=`; an inline module
// script with no src and no remaining `import` statements never fetches and
// works fine from a `file://` origin. The `crossorigin` attribute is
// meaningless for inline scripts (CORS only applies to fetches) so stripping
// it is purely cosmetic.
// transformIndexHtml runs before viteSingleFile inlines chunks (regardless of
// plugin array order — Vite has its own hook ordering), so the inlined
// `<script type="module" crossorigin>` is invisible to a transformIndexHtml
// hook. closeBundle reads the final on-disk file and rewrites it, which
// avoids the ordering surprise entirely.
function standaloneHtmlFixupPlugin(): Plugin {
const outputPath = resolve(__dirname, "dist/web-standalone/index.html");
const publicDir = resolve(__dirname, "public");
return {
name: "standalone-html-fixup",
closeBundle() {
let html = readFileSync(outputPath, "utf8");
// Inline the manifest as a data URL. External manifest fetches fail
// under file:// (null origin), so replace the href rather than strip
// the tag. Icons inside the manifest are also embedded as base64 data
// URLs — they would 404 under file://, and having them self-contained
// means the manifest works if the file is later served over HTTP and
// the user triggers a PWA install prompt.
const manifest: Manifest = JSON.parse(
readFileSync(resolve(publicDir, "manifest.webmanifest"), "utf8"),
);
manifest.icons = manifest.icons.map((icon) => {
const filename = icon.src.replace(/^\.\//, "");
const bytes = readFileSync(resolve(publicDir, filename));
return { ...icon, src: `data:${icon.type};base64,${bytes.toString("base64")}` };
});
const manifestDataUrl = `data:application/manifest+json,${encodeURIComponent(JSON.stringify(manifest))}`;
html = html.replace(
/<link\s+rel="manifest"[^>]*>/g,
`<link rel="manifest" href="${manifestDataUrl}">`,
);
// Inline the favicon — same reason: file:// can't fetch external hrefs.
const faviconBytes = readFileSync(resolve(publicDir, "icon-192.png"));
const faviconDataUrl = `data:image/png;base64,${faviconBytes.toString("base64")}`;
html = html.replace(
/<link\s+rel="icon"[^>]*>/g,
`<link rel="icon" type="image/png" href="${faviconDataUrl}">`,
);
// Strip the PWA service-worker registration — service workers won't
// register from a null (file://) origin.
html = html
.replace(
/<script\s+id="vite-plugin-pwa:register-sw"[^>]*><\/script>\s*/g,
"",
)
.replace(
/<script\s+type="module"\s+crossorigin>/g,
'<script type="module">',
);
writeFileSync(outputPath, html);
},
};
}
export default defineConfig({
root: resolve(__dirname, "src/web"),
// publicDir disabled — manifest/icon/_headers from public/ are useless for
// a single-file file:// deliverable and would pollute dist/web-standalone/
// with extra files. The plugin above strips the corresponding <link> tags.
publicDir: false,
base: "./",
// Build-time flag consumed by ffmpeg_wasm_fetch.ts to tree-shake the
// bare `import("@ffmpeg/core")` PWA branch. Without this, Vite statically
// bundles the ~110 KB factory + its ~43 MB data: URL wasm fallback into
// the single-file HTML even though readInlinedCore() returns first at
// runtime. See the comment block above resolveCore() for the rationale.
define: {
__WITH_STANDALONE_INLINE__: "true",
},
build: {
outDir: resolve(__dirname, "dist/web-standalone"),
emptyOutDir: true,
// Inline any asset imported from JS/CSS up to this size as a data: URI.
// Note this does NOT affect public/ files (the favicon, manifest, etc.) —
// those bypass Vite's asset pipeline and are gated by `publicDir: false`
// above plus the <link>-stripping in standaloneHtmlFixupPlugin. The
// ceiling is generous because the only point of this build is a
// one-file deliverable: better to inline a 200 KB asset than ship a
// second file alongside the HTML.
assetsInlineLimit: 256 * 1024,
// Singlefile already inlines every chunk, so there are no preload
// targets. Disabling stops Vite from emitting its modulepreload
// polyfill IIFE — saves bytes and avoids a stray <script type="module"
// crossorigin> tag in the output.
modulePreload: false,
rollupOptions: {
output: {
// Required by viteSingleFile: dynamic imports become static so
// everything lands in the same bundle. Loses lazy-loading of the
// PDF chunk (~430 KB), which is acceptable for hand-off use.
inlineDynamicImports: true,
},
},
},
// Plugin order:
// - standaloneWasmStubPlugin: runs BEFORE Vite's asset pipeline; replaces
// the `?url` import for zeroperl.wasm with a sentinel string. This
// stops Vite from inlining the 25 MB WASM as a data URL in the JS
// bundle (which it does under inlineDynamicImports: true regardless
// of assetsInlineLimit). Without this step, V8 has to allocate the
// 33 MB Base64 string as a module-scope literal at page-load time
// — the slowness chunk B introduced.
// - viteSingleFile: inlines JS/CSS into the HTML.
// - standaloneHtmlFixupPlugin: rewrites the inlined script tag's
// attributes (singlefile preserves `type="module"`).
// - standaloneInlineWasmsPlugin: injects zeroperl.wasm AND ffmpeg-core
// (.js + .wasm) as `<script type="text/plain">` tags in a single
// read+write of the HTML. Merged into one plugin so the injection
// sequence is explicit and not dependent on Rollup's
// hookParallel(closeBundle) semantics — two plugins each doing
// read+write of the same file would race the moment either hook
// body grows an `await`.
plugins: [
react(),
standaloneWasmStubPlugin(),
standaloneFfmpegStubPlugin(),
viteSingleFile(),
standaloneHtmlFixupPlugin(),
standaloneInlineWasmsPlugin(),
],
});
// Intercepts the `?url` import for zeroperl.wasm and replaces its resolved
// value with a sentinel string. Vite normally resolves `?url` to either
// a sibling asset URL OR a data URL (the latter when inlineDynamicImports
// is on and the build is single-chunk). For standalone we want neither —
// the WASM lives in the HTML's `<script type="text/plain">` tag instead,
// and `redirectWasmFetch` reads from there. The sentinel "inline:zeroperl"
// is just something the helper can pattern-match.
function standaloneWasmStubPlugin(): Plugin {
const VIRTUAL_ID = "\0virtual:standalone-zeroperl-wasm-url";
return {
name: "standalone-wasm-stub",
enforce: "pre",
resolveId(id) {
if (id === "@6over3/zeroperl-ts/zeroperl.wasm?url") {
return VIRTUAL_ID;
}
return undefined;
},
load(id) {
if (id === VIRTUAL_ID) {
return `export default "inline:zeroperl-wasm";`;
}
return undefined;
},
};
}
// Same pattern as standaloneWasmStubPlugin, but for ffmpeg-core (JS + WASM).
// Worker JS is NOT inlined because we run ffmpeg-core in the main thread —
// the @ffmpeg/ffmpeg wrapper would spawn a type:"module" Web Worker from a
// Blob URL, which fails silently when the page origin is `null` (the
// standalone HTML's file:// case). See ffmpeg_fallback_strategy.ts for the
// architectural rationale.
function standaloneFfmpegStubPlugin(): Plugin {
const STUBS = new Map<string, string>([
["@ffmpeg/core?url", "\0virtual:standalone-ffmpeg-core-js-url"],
["@ffmpeg/core/wasm?url", "\0virtual:standalone-ffmpeg-core-wasm-url"],
]);
const SENTINELS: Record<string, string> = {
"\0virtual:standalone-ffmpeg-core-js-url": "inline:ffmpeg-core-js",
"\0virtual:standalone-ffmpeg-core-wasm-url": "inline:ffmpeg-core-wasm",
};
return {
name: "standalone-ffmpeg-stub",
enforce: "pre",
resolveId(id) {
const virtual = STUBS.get(id);
return virtual ?? undefined;
},
load(id) {
const sentinel = SENTINELS[id];
if (sentinel === undefined) return undefined;
return `export default "${sentinel}";`;
},
};
}
// Merged inline plugin for ALL WASM assets that need to be stashed in the
// standalone HTML. Previously this was two separate plugins
// (standaloneWasmInlinePlugin + standaloneFfmpegInlinePlugin), each with its
// own closeBundle hook doing read+write of the same dist/web-standalone/
// index.html. That worked only because both hook bodies were fully
// synchronous (sync fs calls) — Rollup invokes closeBundle hooks via
// hookParallel(), and any future `await` inside either hook would let the
// second plugin read a stale HTML mid-mutation and clobber the first
// plugin's injection. Merging into one closeBundle removes the ordering
// hazard and reduces three reads + two writes of index.html to one each.
//
// What lands in the HTML, in order:
//
// 1. zeroperl.wasm
// The ExifTool fallback / diff strategies load zeroperl.wasm via a
// `?url` import. viteSingleFile only inlines JS/CSS chunks; large
// asset files like .wasm get emitted as siblings even when
// assetsInlineLimit is tall.
//
// Why this matters for the standalone target specifically: the
// standalone HTML is opened via `file://`. Chromium browsers block
// cross-file `fetch()` from `file://` origins by default. A sibling
// .wasm would silently fail to load on Chrome/Edge/Brave under
// file://, breaking every diff view and the .webp/.gif/.avif strip
// paths.
//
// What we used to do (chunk B / B.1 early):
// Substitute every `./assets/zeroperl-<hash>.wasm` URL in the
// inlined JS with a `data:application/wasm;base64,…` URL.
// Problem: the resulting Base64 string is a MODULE-SCOPE STRING
// LITERAL in the JS bundle. V8 allocates it eagerly during module
// parse — ~500-1500ms blocking the page load before first paint
// on a 33 MB Base64 payload. That's the regression the user
// reported.
//
// What we do now:
// Inject the Base64 as a
// `<script type="text/plain" id="zeroperl-wasm-base64">…</script>`
// tag in the HTML body BEFORE the module script. The HTML parser
// stores the textContent in the DOM but does NOT parse the
// contents as JavaScript, so V8's module-parse cost drops from
// 33 MB to ~150 KB (the wrapper code). On first WASM request, the
// wrapper's `redirectWasmFetch` helper reads the textContent and
// decodes it via `fetch(data:URL)` (browser-native Base64 path).
// Same total disk I/O, ~500-1500ms shaved off time-to-interactive.
//
// PWA / APK builds keep the sibling asset (they don't hit the
// file:// CORS constraints and `runtimeCaching` handles repeat
// loads). See
// docs/superpowers/specs/2026-05-21-issue-22-diff-pivot-design.md
// §8.1 for the original tradeoff discussion.
//
// 2. ffmpeg-core.js + ffmpeg-core.wasm
// ffmpeg_wasm_fetch.ts reads from the same DOM IDs at runtime (see
// readInlinedCore there) and feeds the WASM bytes directly to
// createFFmpegCore({wasmBinary}). Without this:
// - ffmpeg-core.wasm (30.7 MB) gets emitted as
// `dist/web-standalone/assets/ffmpeg-core-<hash>.wasm` —
// defeats the single-file deliverable and 404s under file://
// CORS rules.
// - ffmpeg-core.js similarly emits as a sibling.
//
// Gzip + base64 cuts each payload roughly 3× (wasm compresses well —
// lots of repeated LEB128 patterns + symbol tables; ffmpeg's 30.7 MB
// wasm → ~10 MB gz → ~13 MB base64). The runtime decoders in
// exiftool_wasm_fetch.ts / ffmpeg_wasm_fetch.ts pipe the bytes through
// DecompressionStream("gzip") at first use. HTML-parse cost at page
// load scales with text-node size, so shrinking the inlined string is
// the start-time win.
//
// Base64 alphabet (A-Z a-z 0-9 + / =) contains no HTML-special
// characters, so direct embedding in a <script> body is safe without
// escaping.
function standaloneInlineWasmsPlugin(): Plugin {
const outDir = resolve(__dirname, "dist/web-standalone");
const htmlPath = resolve(outDir, "index.html");
const assetsDir = resolve(outDir, "assets");
// Source assets directly from node_modules: the stub plugins intercept
// the `?url` imports and replace them with sentinels, so Vite never
// sees these assets and never emits them. We read the bytes here at
// closeBundle time and stash them as <script type="text/plain"> tags.
// Sourced from each package's ESM export path (matches what the `?url`
// imports would have resolved to).
const INLINE_ASSETS: ReadonlyArray<{
label: string;
domId: string;
source: string;
}> = [
{
label: "zeroperl.wasm",
domId: "zeroperl-wasm-base64",
source: resolve(
__dirname,
"node_modules/@6over3/zeroperl-ts/dist/esm/zeroperl.wasm",
),
},
{
label: "ffmpeg-core.js",
domId: "ffmpeg-core-js-base64",
source: resolve(
__dirname,
"node_modules/@ffmpeg/core/dist/esm/ffmpeg-core.js",
),
},
{
label: "ffmpeg-core.wasm",
domId: "ffmpeg-core-wasm-base64",
source: resolve(
__dirname,
"node_modules/@ffmpeg/core/dist/esm/ffmpeg-core.wasm",
),
},
];
return {
name: "standalone-inline-wasms",
closeBundle() {
// 1. Read HTML once.
let html = readFileSync(htmlPath, "utf8");
const moduleScriptMarker = '<script type="module">';
if (!html.includes(moduleScriptMarker)) {
throw new Error(
`standaloneInlineWasmsPlugin: could not find <script type="module"> ` +
`in HTML. viteSingleFile may have changed its inline-script ` +
`shape; the inline-tag injection point needs updating.`,
);
}
// 2. Read + gzip + base64 each asset; accumulate inline tags.
let injected = "";
const summaryLines: string[] = [];
for (const asset of INLINE_ASSETS) {
if (!existsSync(asset.source)) {
throw new Error(
`standaloneInlineWasmsPlugin: ${asset.label} not found at ` +
`${asset.source}. Check the corresponding dependency install.`,
);
}
const bytes = readFileSync(asset.source);
const gzipped = gzipSync(bytes, { level: 9 });
const base64 = gzipped.toString("base64");
injected += `<script type="text/plain" id="${asset.domId}">${base64}</script>\n`;
summaryLines.push(
` ${asset.label}: ${bytes.length}${gzipped.length} bytes gzipped ` +
`(base64 ${base64.length} bytes)`,
);
}
// 3. Write HTML once with all injections, then log a single summary.
html = html.replace(
moduleScriptMarker,
`${injected}${moduleScriptMarker}`,
);
writeFileSync(htmlPath, html);
console.log(
`standaloneInlineWasmsPlugin: stashed ${INLINE_ASSETS.length} assets in ` +
`<script type="text/plain"> tags\n${summaryLines.join("\n")}`,
);
// 4. Defensive: if Vite ever emits assets/ siblings again, clean
// them up to keep the standalone output to exactly one file.
if (existsSync(assetsDir) && readdirSync(assetsDir).length === 0) {
rmdirSync(assetsDir);
}
},
};
}