Adds FfmpegFallbackStrategy as a peer to ExifToolFallbackStrategy, routing MP4/MOV/M4V (Phase 1) and MKV/WebM (Phase 2) through @ffmpeg/core. On by default for all three distributions (standalone HTML, Capacitor APK, PWA self-host); VITE_ENABLE_FFMPEG_FALLBACK=false opts out. Takes priority over VideoStrategy for the MP4 family; VideoStrategy stays registered as the opt-out fallback until a subsequent PR deletes it. Closes #182. Closes #43. Resolves the four documented walker KNOWN_GAPS categorically: handler-name leak (#38), compressor-name leak (#39), mvhd.next_track_id leak (#111), GPMF/GPS coordinates leak (#42). On gopro-fusion.mp4 (5.1 MB GPMF + tmcd + fdsc) and dji-phantom4.mov (236 MB UserData GPS log) the forensic battery reports zero device-fingerprint survival across every recovery technique. Key architectural choices: - **Main-thread @ffmpeg/core, not @ffmpeg/ffmpeg wrapper.** The wrapper hardcodes type:"module" Workers from Blob URLs, which fail silently under null-origin file:// in Chromium — the standalone build hung forever on every video strip. @ffmpeg/ffmpeg dropped from package.json. - **Stream mapping -map 0 -map -0:d? -map -0:s? -map -0:t?**. Preserves input track order while dropping data/subtitle/timecode streams. Avoids the eng→und reorder bug of -map 0:v?/-map 0:a?, and sidesteps mat2's exit-234 on action-cam files (GoPro Fusion has tmcd/fdsc). - **Post-strip pass rewrites the udta box type to 'free'** (ISO/IEC 14496-12 §8.1.2 padding) to neutralise ffmpeg's hardcoded HandlerType:Metadata + HandlerVendorID:Apple stub. Length-preserving so stco/co64 offsets stay valid. Handles both regular and largesize headers via headerStart+4. - **mdhd.language left as ffmpeg's 'und'** — considered zeroing but reverted: 0x0000 is an invalid ISO 639-2/T code, ffprobe falls back to displaying '(eng)' for invalid codes (actively misleading downstream tools). - **Diff race fix.** @uswriting/exiftool's parseMetadata uses module-level singletons (Perl, MemoryFS, stdout/stderr StringBuilders). WasmProcessor now serializes all diff builds across the processor's lifetime via a Promise chain — guarantees no two parseMetadata calls overlap, whether within an entry or across the fire-and-forget chunk-drained queue. - **ExifTool family-1 group names surfaced verbatim** — IFD0, ExifIFD, XMP-dc, Track1, etc. Refuses to collapse to umbrella labels like 'EXIF' because the collapse caused (source, name) key collisions across sub-groups (Track1:HandlerType vs Track2:HandlerType produced spurious diffs on multi-track MP4). - **Standalone HTML stays single-file.** Two-asset Vite plugin gzip+base64-inlines ffmpeg-core.js + ffmpeg-core.wasm into <script type=text/plain> tags, mirroring the zeroperl pattern. With tree-shaking via __WITH_STANDALONE_INLINE__ the standalone HTML went 116MB → 24MB. Forensic verification: docs/forensic/ffmpeg-fallback.md + tools/forensic/ffmpeg-fallback.ts cover synthetic-mp4/mkv/webm + phone-baseline (2.7MB Android) + gopro-fusion (5MB action-cam) + dji-phantom4 (236MB drone) with zero sentinel/fingerprint survival across the recovery battery. Gap analyses for all three formats at docs/gap-analysis/mp4-ffmpeg.md, mkv.md, webm.md. POC at docs/poc/ffmpeg-wasm.md. Production deps go from 5 → 6: @ffmpeg/core@0.12.10 (GPL-2.0-or-later; combined distributable inherits, MIT codebase unchanged, source pointer in README per GPL compliance). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
404 lines
17 KiB
TypeScript
404 lines
17 KiB
TypeScript
import { defineConfig } from "vite";
|
||
import react from "@vitejs/plugin-react";
|
||
import { viteSingleFile } from "vite-plugin-singlefile";
|
||
import {
|
||
readFileSync,
|
||
writeFileSync,
|
||
readdirSync,
|
||
rmdirSync,
|
||
existsSync,
|
||
} from "node:fs";
|
||
import { gzipSync } from "node:zlib";
|
||
import { resolve } from "node:path";
|
||
import type { Plugin } from "vite";
|
||
|
||
type ManifestIcon = {
|
||
src: string;
|
||
sizes: string;
|
||
type: string;
|
||
purpose?: string;
|
||
};
|
||
|
||
type Manifest = {
|
||
icons: ManifestIcon[];
|
||
[key: string]: unknown;
|
||
};
|
||
|
||
// The standalone build is meant to be opened directly via `file://`. Two
|
||
// things in `vite.config.web.ts` make that impossible and need to be undone
|
||
// here:
|
||
//
|
||
// 1. CSP meta tag — `script-src 'self'` doesn't match anything useful under
|
||
// a `null` origin (file://), and the singlefile inlining also requires
|
||
// 'unsafe-inline' anyway. We just drop the tag; there are no remote
|
||
// origins in a self-contained bundle for it to gate.
|
||
//
|
||
// 2. ES module loading — Chromium browsers (Chrome, Brave, Edge) refuse to
|
||
// load `<script type="module" src="./...">` from a file:// origin with a
|
||
// CORS error ("Cross origin requests are only supported for protocol
|
||
// schemes: http, https, …"). vite-plugin-singlefile fixes this by
|
||
// inlining the entire bundle into a single inline `<script>` (the script
|
||
// stays `type="module"` — see the block below for why), which CORS
|
||
// doesn't apply to because there's no fetch.
|
||
//
|
||
// Manifest + service worker + favicon link are also stripped because they
|
||
// reference external files that fail under file:// (manifest + favicon hit
|
||
// the same CORS rule; service workers reject the `null` origin).
|
||
//
|
||
// The inlined script MUST stay `type="module"` — the bundle uses
|
||
// `import.meta` references that only resolve in module context (classic
|
||
// scripts throw "Cannot use 'import.meta' outside a module"). Module scripts
|
||
// are only subject to CORS for external fetches via `src=`; an inline module
|
||
// script with no src and no remaining `import` statements never fetches and
|
||
// works fine from a `file://` origin. The `crossorigin` attribute is
|
||
// meaningless for inline scripts (CORS only applies to fetches) so stripping
|
||
// it is purely cosmetic.
|
||
|
||
// transformIndexHtml runs before viteSingleFile inlines chunks (regardless of
|
||
// plugin array order — Vite has its own hook ordering), so the inlined
|
||
// `<script type="module" crossorigin>` is invisible to a transformIndexHtml
|
||
// hook. closeBundle reads the final on-disk file and rewrites it, which
|
||
// avoids the ordering surprise entirely.
|
||
function standaloneHtmlFixupPlugin(): Plugin {
|
||
const outputPath = resolve(__dirname, "dist/web-standalone/index.html");
|
||
const publicDir = resolve(__dirname, "public");
|
||
return {
|
||
name: "standalone-html-fixup",
|
||
closeBundle() {
|
||
let html = readFileSync(outputPath, "utf8");
|
||
|
||
// Inline the manifest as a data URL. External manifest fetches fail
|
||
// under file:// (null origin), so replace the href rather than strip
|
||
// the tag. Icons inside the manifest are also embedded as base64 data
|
||
// URLs — they would 404 under file://, and having them self-contained
|
||
// means the manifest works if the file is later served over HTTP and
|
||
// the user triggers a PWA install prompt.
|
||
const manifest: Manifest = JSON.parse(
|
||
readFileSync(resolve(publicDir, "manifest.webmanifest"), "utf8"),
|
||
);
|
||
manifest.icons = manifest.icons.map((icon) => {
|
||
const filename = icon.src.replace(/^\.\//, "");
|
||
const bytes = readFileSync(resolve(publicDir, filename));
|
||
return { ...icon, src: `data:${icon.type};base64,${bytes.toString("base64")}` };
|
||
});
|
||
const manifestDataUrl = `data:application/manifest+json,${encodeURIComponent(JSON.stringify(manifest))}`;
|
||
html = html.replace(
|
||
/<link\s+rel="manifest"[^>]*>/g,
|
||
`<link rel="manifest" href="${manifestDataUrl}">`,
|
||
);
|
||
|
||
// Inline the favicon — same reason: file:// can't fetch external hrefs.
|
||
const faviconBytes = readFileSync(resolve(publicDir, "icon-192.png"));
|
||
const faviconDataUrl = `data:image/png;base64,${faviconBytes.toString("base64")}`;
|
||
html = html.replace(
|
||
/<link\s+rel="icon"[^>]*>/g,
|
||
`<link rel="icon" type="image/png" href="${faviconDataUrl}">`,
|
||
);
|
||
|
||
// Strip the PWA service-worker registration — service workers won't
|
||
// register from a null (file://) origin.
|
||
html = html
|
||
.replace(
|
||
/<script\s+id="vite-plugin-pwa:register-sw"[^>]*><\/script>\s*/g,
|
||
"",
|
||
)
|
||
.replace(
|
||
/<script\s+type="module"\s+crossorigin>/g,
|
||
'<script type="module">',
|
||
);
|
||
writeFileSync(outputPath, html);
|
||
},
|
||
};
|
||
}
|
||
|
||
export default defineConfig({
|
||
root: resolve(__dirname, "src/web"),
|
||
// publicDir disabled — manifest/icon/_headers from public/ are useless for
|
||
// a single-file file:// deliverable and would pollute dist/web-standalone/
|
||
// with extra files. The plugin above strips the corresponding <link> tags.
|
||
publicDir: false,
|
||
base: "./",
|
||
// Build-time flag consumed by ffmpeg_wasm_fetch.ts to tree-shake the
|
||
// bare `import("@ffmpeg/core")` PWA branch. Without this, Vite statically
|
||
// bundles the ~110 KB factory + its ~43 MB data: URL wasm fallback into
|
||
// the single-file HTML even though readInlinedCore() returns first at
|
||
// runtime. See the comment block above resolveCore() for the rationale.
|
||
define: {
|
||
__WITH_STANDALONE_INLINE__: "true",
|
||
},
|
||
build: {
|
||
outDir: resolve(__dirname, "dist/web-standalone"),
|
||
emptyOutDir: true,
|
||
// Inline any asset imported from JS/CSS up to this size as a data: URI.
|
||
// Note this does NOT affect public/ files (the favicon, manifest, etc.) —
|
||
// those bypass Vite's asset pipeline and are gated by `publicDir: false`
|
||
// above plus the <link>-stripping in standaloneHtmlFixupPlugin. The
|
||
// ceiling is generous because the only point of this build is a
|
||
// one-file deliverable: better to inline a 200 KB asset than ship a
|
||
// second file alongside the HTML.
|
||
assetsInlineLimit: 256 * 1024,
|
||
// Singlefile already inlines every chunk, so there are no preload
|
||
// targets. Disabling stops Vite from emitting its modulepreload
|
||
// polyfill IIFE — saves bytes and avoids a stray <script type="module"
|
||
// crossorigin> tag in the output.
|
||
modulePreload: false,
|
||
rollupOptions: {
|
||
output: {
|
||
// Required by viteSingleFile: dynamic imports become static so
|
||
// everything lands in the same bundle. Loses lazy-loading of the
|
||
// PDF chunk (~430 KB), which is acceptable for hand-off use.
|
||
inlineDynamicImports: true,
|
||
},
|
||
},
|
||
},
|
||
// Plugin order:
|
||
// - standaloneWasmStubPlugin: runs BEFORE Vite's asset pipeline; replaces
|
||
// the `?url` import for zeroperl.wasm with a sentinel string. This
|
||
// stops Vite from inlining the 25 MB WASM as a data URL in the JS
|
||
// bundle (which it does under inlineDynamicImports: true regardless
|
||
// of assetsInlineLimit). Without this step, V8 has to allocate the
|
||
// 33 MB Base64 string as a module-scope literal at page-load time
|
||
// — the slowness chunk B introduced.
|
||
// - viteSingleFile: inlines JS/CSS into the HTML.
|
||
// - standaloneHtmlFixupPlugin: rewrites the inlined script tag's
|
||
// attributes (singlefile preserves `type="module"`).
|
||
// - standaloneInlineWasmsPlugin: injects zeroperl.wasm AND ffmpeg-core
|
||
// (.js + .wasm) as `<script type="text/plain">` tags in a single
|
||
// read+write of the HTML. Merged into one plugin so the injection
|
||
// sequence is explicit and not dependent on Rollup's
|
||
// hookParallel(closeBundle) semantics — two plugins each doing
|
||
// read+write of the same file would race the moment either hook
|
||
// body grows an `await`.
|
||
plugins: [
|
||
react(),
|
||
standaloneWasmStubPlugin(),
|
||
standaloneFfmpegStubPlugin(),
|
||
viteSingleFile(),
|
||
standaloneHtmlFixupPlugin(),
|
||
standaloneInlineWasmsPlugin(),
|
||
],
|
||
});
|
||
|
||
// Intercepts the `?url` import for zeroperl.wasm and replaces its resolved
|
||
// value with a sentinel string. Vite normally resolves `?url` to either
|
||
// a sibling asset URL OR a data URL (the latter when inlineDynamicImports
|
||
// is on and the build is single-chunk). For standalone we want neither —
|
||
// the WASM lives in the HTML's `<script type="text/plain">` tag instead,
|
||
// and `redirectWasmFetch` reads from there. The sentinel "inline:zeroperl"
|
||
// is just something the helper can pattern-match.
|
||
function standaloneWasmStubPlugin(): Plugin {
|
||
const VIRTUAL_ID = "\0virtual:standalone-zeroperl-wasm-url";
|
||
return {
|
||
name: "standalone-wasm-stub",
|
||
enforce: "pre",
|
||
resolveId(id) {
|
||
if (id === "@6over3/zeroperl-ts/zeroperl.wasm?url") {
|
||
return VIRTUAL_ID;
|
||
}
|
||
return undefined;
|
||
},
|
||
load(id) {
|
||
if (id === VIRTUAL_ID) {
|
||
return `export default "inline:zeroperl-wasm";`;
|
||
}
|
||
return undefined;
|
||
},
|
||
};
|
||
}
|
||
|
||
// Same pattern as standaloneWasmStubPlugin, but for ffmpeg-core (JS + WASM).
|
||
// Worker JS is NOT inlined because we run ffmpeg-core in the main thread —
|
||
// the @ffmpeg/ffmpeg wrapper would spawn a type:"module" Web Worker from a
|
||
// Blob URL, which fails silently when the page origin is `null` (the
|
||
// standalone HTML's file:// case). See ffmpeg_fallback_strategy.ts for the
|
||
// architectural rationale.
|
||
function standaloneFfmpegStubPlugin(): Plugin {
|
||
const STUBS = new Map<string, string>([
|
||
["@ffmpeg/core?url", "\0virtual:standalone-ffmpeg-core-js-url"],
|
||
["@ffmpeg/core/wasm?url", "\0virtual:standalone-ffmpeg-core-wasm-url"],
|
||
]);
|
||
const SENTINELS: Record<string, string> = {
|
||
"\0virtual:standalone-ffmpeg-core-js-url": "inline:ffmpeg-core-js",
|
||
"\0virtual:standalone-ffmpeg-core-wasm-url": "inline:ffmpeg-core-wasm",
|
||
};
|
||
return {
|
||
name: "standalone-ffmpeg-stub",
|
||
enforce: "pre",
|
||
resolveId(id) {
|
||
const virtual = STUBS.get(id);
|
||
return virtual ?? undefined;
|
||
},
|
||
load(id) {
|
||
const sentinel = SENTINELS[id];
|
||
if (sentinel === undefined) return undefined;
|
||
return `export default "${sentinel}";`;
|
||
},
|
||
};
|
||
}
|
||
|
||
// Merged inline plugin for ALL WASM assets that need to be stashed in the
|
||
// standalone HTML. Previously this was two separate plugins
|
||
// (standaloneWasmInlinePlugin + standaloneFfmpegInlinePlugin), each with its
|
||
// own closeBundle hook doing read+write of the same dist/web-standalone/
|
||
// index.html. That worked only because both hook bodies were fully
|
||
// synchronous (sync fs calls) — Rollup invokes closeBundle hooks via
|
||
// hookParallel(), and any future `await` inside either hook would let the
|
||
// second plugin read a stale HTML mid-mutation and clobber the first
|
||
// plugin's injection. Merging into one closeBundle removes the ordering
|
||
// hazard and reduces three reads + two writes of index.html to one each.
|
||
//
|
||
// What lands in the HTML, in order:
|
||
//
|
||
// 1. zeroperl.wasm
|
||
// The ExifTool fallback / diff strategies load zeroperl.wasm via a
|
||
// `?url` import. viteSingleFile only inlines JS/CSS chunks; large
|
||
// asset files like .wasm get emitted as siblings even when
|
||
// assetsInlineLimit is tall.
|
||
//
|
||
// Why this matters for the standalone target specifically: the
|
||
// standalone HTML is opened via `file://`. Chromium browsers block
|
||
// cross-file `fetch()` from `file://` origins by default. A sibling
|
||
// .wasm would silently fail to load on Chrome/Edge/Brave under
|
||
// file://, breaking every diff view and the .webp/.gif/.avif strip
|
||
// paths.
|
||
//
|
||
// What we used to do (chunk B / B.1 early):
|
||
// Substitute every `./assets/zeroperl-<hash>.wasm` URL in the
|
||
// inlined JS with a `data:application/wasm;base64,…` URL.
|
||
// Problem: the resulting Base64 string is a MODULE-SCOPE STRING
|
||
// LITERAL in the JS bundle. V8 allocates it eagerly during module
|
||
// parse — ~500-1500ms blocking the page load before first paint
|
||
// on a 33 MB Base64 payload. That's the regression the user
|
||
// reported.
|
||
//
|
||
// What we do now:
|
||
// Inject the Base64 as a
|
||
// `<script type="text/plain" id="zeroperl-wasm-base64">…</script>`
|
||
// tag in the HTML body BEFORE the module script. The HTML parser
|
||
// stores the textContent in the DOM but does NOT parse the
|
||
// contents as JavaScript, so V8's module-parse cost drops from
|
||
// 33 MB to ~150 KB (the wrapper code). On first WASM request, the
|
||
// wrapper's `redirectWasmFetch` helper reads the textContent and
|
||
// decodes it via `fetch(data:URL)` (browser-native Base64 path).
|
||
// Same total disk I/O, ~500-1500ms shaved off time-to-interactive.
|
||
//
|
||
// PWA / APK builds keep the sibling asset (they don't hit the
|
||
// file:// CORS constraints and `runtimeCaching` handles repeat
|
||
// loads). See
|
||
// docs/superpowers/specs/2026-05-21-issue-22-diff-pivot-design.md
|
||
// §8.1 for the original tradeoff discussion.
|
||
//
|
||
// 2. ffmpeg-core.js + ffmpeg-core.wasm
|
||
// ffmpeg_wasm_fetch.ts reads from the same DOM IDs at runtime (see
|
||
// readInlinedCore there) and feeds the WASM bytes directly to
|
||
// createFFmpegCore({wasmBinary}). Without this:
|
||
// - ffmpeg-core.wasm (30.7 MB) gets emitted as
|
||
// `dist/web-standalone/assets/ffmpeg-core-<hash>.wasm` —
|
||
// defeats the single-file deliverable and 404s under file://
|
||
// CORS rules.
|
||
// - ffmpeg-core.js similarly emits as a sibling.
|
||
//
|
||
// Gzip + base64 cuts each payload roughly 3× (wasm compresses well —
|
||
// lots of repeated LEB128 patterns + symbol tables; ffmpeg's 30.7 MB
|
||
// wasm → ~10 MB gz → ~13 MB base64). The runtime decoders in
|
||
// exiftool_wasm_fetch.ts / ffmpeg_wasm_fetch.ts pipe the bytes through
|
||
// DecompressionStream("gzip") at first use. HTML-parse cost at page
|
||
// load scales with text-node size, so shrinking the inlined string is
|
||
// the start-time win.
|
||
//
|
||
// Base64 alphabet (A-Z a-z 0-9 + / =) contains no HTML-special
|
||
// characters, so direct embedding in a <script> body is safe without
|
||
// escaping.
|
||
function standaloneInlineWasmsPlugin(): Plugin {
|
||
const outDir = resolve(__dirname, "dist/web-standalone");
|
||
const htmlPath = resolve(outDir, "index.html");
|
||
const assetsDir = resolve(outDir, "assets");
|
||
// Source assets directly from node_modules: the stub plugins intercept
|
||
// the `?url` imports and replace them with sentinels, so Vite never
|
||
// sees these assets and never emits them. We read the bytes here at
|
||
// closeBundle time and stash them as <script type="text/plain"> tags.
|
||
// Sourced from each package's ESM export path (matches what the `?url`
|
||
// imports would have resolved to).
|
||
const INLINE_ASSETS: ReadonlyArray<{
|
||
label: string;
|
||
domId: string;
|
||
source: string;
|
||
}> = [
|
||
{
|
||
label: "zeroperl.wasm",
|
||
domId: "zeroperl-wasm-base64",
|
||
source: resolve(
|
||
__dirname,
|
||
"node_modules/@6over3/zeroperl-ts/dist/esm/zeroperl.wasm",
|
||
),
|
||
},
|
||
{
|
||
label: "ffmpeg-core.js",
|
||
domId: "ffmpeg-core-js-base64",
|
||
source: resolve(
|
||
__dirname,
|
||
"node_modules/@ffmpeg/core/dist/esm/ffmpeg-core.js",
|
||
),
|
||
},
|
||
{
|
||
label: "ffmpeg-core.wasm",
|
||
domId: "ffmpeg-core-wasm-base64",
|
||
source: resolve(
|
||
__dirname,
|
||
"node_modules/@ffmpeg/core/dist/esm/ffmpeg-core.wasm",
|
||
),
|
||
},
|
||
];
|
||
return {
|
||
name: "standalone-inline-wasms",
|
||
closeBundle() {
|
||
// 1. Read HTML once.
|
||
let html = readFileSync(htmlPath, "utf8");
|
||
const moduleScriptMarker = '<script type="module">';
|
||
if (!html.includes(moduleScriptMarker)) {
|
||
throw new Error(
|
||
`standaloneInlineWasmsPlugin: could not find <script type="module"> ` +
|
||
`in HTML. viteSingleFile may have changed its inline-script ` +
|
||
`shape; the inline-tag injection point needs updating.`,
|
||
);
|
||
}
|
||
|
||
// 2. Read + gzip + base64 each asset; accumulate inline tags.
|
||
let injected = "";
|
||
const summaryLines: string[] = [];
|
||
for (const asset of INLINE_ASSETS) {
|
||
if (!existsSync(asset.source)) {
|
||
throw new Error(
|
||
`standaloneInlineWasmsPlugin: ${asset.label} not found at ` +
|
||
`${asset.source}. Check the corresponding dependency install.`,
|
||
);
|
||
}
|
||
const bytes = readFileSync(asset.source);
|
||
const gzipped = gzipSync(bytes, { level: 9 });
|
||
const base64 = gzipped.toString("base64");
|
||
injected += `<script type="text/plain" id="${asset.domId}">${base64}</script>\n`;
|
||
summaryLines.push(
|
||
` ${asset.label}: ${bytes.length} → ${gzipped.length} bytes gzipped ` +
|
||
`(base64 ${base64.length} bytes)`,
|
||
);
|
||
}
|
||
|
||
// 3. Write HTML once with all injections, then log a single summary.
|
||
html = html.replace(
|
||
moduleScriptMarker,
|
||
`${injected}${moduleScriptMarker}`,
|
||
);
|
||
writeFileSync(htmlPath, html);
|
||
console.log(
|
||
`standaloneInlineWasmsPlugin: stashed ${INLINE_ASSETS.length} assets in ` +
|
||
`<script type="text/plain"> tags\n${summaryLines.join("\n")}`,
|
||
);
|
||
|
||
// 4. Defensive: if Vite ever emits assets/ siblings again, clean
|
||
// them up to keep the standalone output to exactly one file.
|
||
if (existsSync(assetsDir) && readdirSync(assetsDir).length === 0) {
|
||
rmdirSync(assetsDir);
|
||
}
|
||
},
|
||
};
|
||
}
|