Five fixes prompted by post-merge review of the PNG strategy: 1. PNG 3rd Edition (W3C, 2024) HDR signaling chunks are now kept: - cICP — color primaries / transfer / matrix coefficients (4 bytes) - mDCv — mastering display chromaticity + luminance (24 bytes) - cLLi — content light level info (8 bytes) Dropping them silently breaks HDR rendering on aware viewers (gamut clipping, wrong tone curve) for zero privacy benefit — they're fixed-size numerics with no user-attributable surface. 2. sTER (stereo indicator) now kept. 1 byte indicating cross-fused vs diverging viewing mode. Same reasoning as tRNS / APNG: dropping it changes how a stereo pair is rendered with no privacy benefit (the stereo nature is already evident from the dimensions). 3. Duplicate IHDR is now rejected as `invalid-file-format`. Per PNG §5.6 IHDR appears exactly once; the first-chunk-must-be-IHDR guard already existed, the symmetric duplicate guard didn't. 4. The strategy now claims `.apng` in addition to `.png`. APNG files (registered MIME `image/apng`) frequently use the .apng extension; same byte structure, same chunk policy. Added to WASM_HANDLED_EXTENSIONS so the Electron build routes them too. 5. Top-of-file policy comment updated to match — and to drop the "mirrors ExifTool's `-all=` behaviour" framing, which the forensic test contradicted (we are strictly more aggressive). Updated docs/gap-analysis/png.md with the new policy rows + a section on why the new keep-list entries don't violate privacy. docs/forensic/png.md re-verified — same 66-byte output, same zero sentinel survival. The new chunks have no string-attributable surface so they are forensic non-events. Tests: +5 (3 parametrized HDR keep, 1 sTER keep, 1 duplicate-IHDR reject). 423/423 pass. Quality gates: `tsc --noEmit` clean, `prettier --check` clean, forensic runner reproduces the documented zero-survival result. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
11 KiB
PNG metadata-stripping gap analysis
Date: 2026-05-07. Updated 2026-05-08 to cover PNG 3rd Edition (W3C, 2024) HDR signaling chunks (cICP, mDCv, cLLi), the stereo indicator (sTER), the duplicate-IHDR rejection, and the .apng extension.
Goal: Document the gap between the WASM library options previously surveyed for image metadata removal and ExifTool's -all= PNG strip, and the rationale for the hand-rolled PNG chunk walker that ships in Phase B8.
Methodology
Read:
- W3C / ISO 15948 PNG specification (2nd Ed., 2003) §11 (chunk types) and §12 (private/ancillary chunk semantics).
- W3C PNG 3rd Edition (2024) for new chunks:
cICP,mDCv,cLLi. https://www.w3.org/TR/png-3/. - libpng documentation on chunk handling and which ancillaries are display-affecting vs informational.
- ExifTool documentation at https://exiftool.org/TagNames/PNG.html for the chunks the
-all=strip affects. - The existing
JpegStrategyprecedent (src/infrastructure/wasm/strategies/jpeg_strategy.ts) and its gap analysis (docs/gap-analysis/jpeg.md).
Reused empirically (from POCs already written for JPEG):
little_exif(Rust → WASM, ~330 KB raw / 111 KB gzip) — surveyed indocs/poc/little-exif-wasm.md. Leaves PNGtEXtchunks untouched. Ruled out.exiv2-wasm(~2.3 MB raw / 925 KB gzip) — surveyed indocs/poc/exiv2-wasm.md. SamewriteString(buf, key, "")no-erase pattern. Ruled out.@uswriting/exiftool-wasm— full Perl ExifTool compiled to WASM. ~5 MB lazy-load. Considered for the original B3 plan; replaced by per-format hand-rolled walkers because the bundle weight is unjustifiable when each strategy is ~150 lines.
Per-chunk policy
PNG file structure: 8-byte signature 89 50 4E 47 0D 0A 1A 0A followed by a sequence of chunks. Each chunk is [length:u32-be][type:4 ASCII][data:length bytes][crc:u32-be]. The CRC covers type + data. Chunk type's case carries semantics (PNG §3.3):
- Bit 5 of byte 0 (Ancillary bit): uppercase = critical, lowercase = ancillary (decoder may skip).
- Bit 5 of byte 1 (Private bit): uppercase = public/registered, lowercase = private/unregistered.
- Bit 5 of byte 2 (Reserved): must be uppercase.
- Bit 5 of byte 3 (Safe-to-copy bit): uppercase = unsafe (depends on image data), lowercase = safe.
Critical chunks define the image; dropping any of them breaks the file. Public ancillary chunks have well-defined privacy implications. Private chunks are vendor-specific and almost always carry identifying metadata.
| Chunk | Critical? | Source of leak | ExifTool -all= |
Phase B8 walker |
|---|---|---|---|---|
IHDR |
yes | n/a — image header | keep | keep |
PLTE |
yes | n/a — palette | keep | keep |
IDAT |
yes | n/a — image data | keep | keep |
IEND |
yes | n/a — end marker | keep | keep |
tRNS |
no (ancillary, safe) | n/a — transparency, display-affecting | keep | keep |
tEXt |
no | latin-1 key=value text (Author, Copyright, Software, …) | drop | drop |
iTXt |
no | UTF-8 international text (XMP commonly stored here) | drop | drop |
zTXt |
no | compressed text (same content as tEXt) | drop | drop |
eXIf |
no | full EXIF block embedded in PNG | drop | drop |
tIME |
no | last-modification timestamp | drop | drop |
pHYs |
no | physical pixel dimensions / aspect ratio | drop | drop |
sPLT |
no | suggested palette (rare, but author-set) | drop | drop |
hIST |
no | image histogram (encoder fingerprint) | drop | drop |
bKGD |
no | suggested background color | drop | drop |
sBIT |
no | significant bits per channel (encoder hint) | drop | drop |
iCCP |
no | embedded ICC color profile (creator strings, dateTime) | drop | drop by default; kept when preserveColorProfile: true |
cHRM |
no | primary chromaticities (color-space metadata) | drop | drop by default; kept when preserveColorProfile: true |
gAMA |
no | image gamma | drop | drop by default; kept when preserveColorProfile: true |
sRGB |
no | sRGB rendering intent | drop | drop by default; kept when preserveColorProfile: true |
acTL, fcTL, fdAT |
no (animation control/data) | n/a — APNG animation parameters and frame pixels, no user identity | keep | keep — dropping them silently turns APNG into a static PNG |
cICP |
no (PNG 3rd Ed., HDR signaling) | n/a — 4 bytes of color-primaries / transfer / matrix coefficients, no user identity | keep | keep — dropping causes HDR PNGs to render with wrong gamut on HDR-aware viewers |
mDCv |
no (PNG 3rd Ed., HDR mastering display) | n/a — 24 bytes of fixed-point chromaticity + luminance for the mastering display | keep | keep — paired with cICP for accurate HDR rendering |
cLLi |
no (PNG 3rd Ed., HDR luminance) | n/a — 8 bytes of MaxCLL / MaxFALL luminance stats for tone mapping | keep | keep — paired with cICP for accurate HDR rendering |
sTER |
no (stereo indicator) | n/a — 1 byte of cross-fused vs diverging viewing-mode flag | keep | keep — dropping silently changes how a side-by-side stereo pair is rendered, with no privacy benefit (the dimensions already disclose stereo) |
| Private ancillary (lowercase second byte) | no | vendor-specific | drop | drop |
| Unknown public ancillary | no | unspecified | drop | drop (conservative) |
Critical chunks are passed through verbatim — the walker never re-encodes image data. CRCs on kept chunks remain valid because the chunk bytes are copied unmodified. The walker validates CRCs on input to reject corrupted PNGs early; it does not recompute CRCs on output (no chunks are synthesized).
Honest gap summary
Library options vs ExifTool: little_exif doesn't touch PNG text chunks; exiv2-wasm's erase semantics are too weak; @uswriting/exiftool-wasm works but is 30× the size of a hand-rolled walker.
ExifTool -all= vs Phase B8 walker: policy table is identical for every chunk listed. Differences at the edges:
- ExifTool will rewrite a
pHYschunk if the user passes-PNG:PixelsPerUnitX; the walker only strips. Granular tag-level operations are out of scope. - ExifTool computes an output hash for the file; not applicable here (we're emitting raw bytes, no hash).
- ExifTool can convert between PNG and APNG semantics; out of scope.
- ExifTool's
-all=does not currently surface PNG 3rd Edition HDR chunks (it predates the spec); on a PNG built withcICP/mDCv/cLLi, ExifTool passes them through verbatim. Walker matches that behaviour by keeping them.
Forensic verification: the original B8 forensic battery (see docs/forensic/png.md) showed that the walker is strictly more aggressive than exiftool -all= on private/unknown chunks: ExifTool keeps prVt, bKGD, and sBIT, all of which the walker drops. That divergence is intentional and remains true after the new keep-list additions — the new chunks (cICP, mDCv, cLLi, sTER) are display-essential numerics, not user-identifying strings, and contain no surface for sentinel survival.
Recommendation
Hand-rolled chunk walker. Reasoning:
- PNG chunk structure is fully specified, simple to parse, and well-documented (~120 lines of clean TypeScript including CRC32 lookup table).
- WASM library options were both ruled out by the same POCs that ruled them out for JPEG.
- Zero production dependencies — ships ~111 KB less than
little_exifwould have cost (and ~5 MB less than WASM ExifTool). - We control the chunk policy directly — the policy table is auditable and there's no library defaults to fight.
Phase B8 implementation
Lives at src/infrastructure/wasm/strategies/png_strategy.ts. Key invariants:
- Signature check: rejects anything that doesn't start with the 8-byte PNG signature
89 50 4E 47 0D 0A 1A 0A. SameverifyMagicBytespattern as JPEG. - Extensions: claims
.pngand.apng. APNG files (registered MIMEimage/apng) often use the.apngextension; same byte structure, same chunk policy, no reason to reject. - CRC32 validation on input: every chunk's stored CRC must match a recomputed CRC over
type + data. A mismatch surfaces asinvalid-file-formatrather than silently emitting a fixed-up file. - Verbatim copy of kept chunks: the walker copies chunk bytes (length + type + data + crc) byte-for-byte. No re-encoding, no CRC recomputation.
- First chunk must be IHDR: rejected as
invalid-file-formatif not. - Exactly one IHDR: a second IHDR anywhere in the stream is rejected. Per spec, IHDR appears exactly once; tolerating duplicates would let a malformed file ship with two image headers, which downstream decoders interpret inconsistently. Symmetric with the CRC validation philosophy — fail loud on structural violations.
- Truncation behaviour: missing or short final chunk surfaces as
invalid-file-format. Specifically, ifIENDis never reached the walker fails loud. metadataRemoved: counts dropped ancillary chunks. A clean input that needed no changes returns0, not1— callers must not treat0as a failure signal.preserveOrientation: PNG has no orientation chunk (orientation is a TIFF/EXIF concept). The flag is silently ignored; honoring it is a no-op rather than an error.preserveColorProfile: true: keepsiCCP,cHRM,gAMA,sRGB. ICC profiles includecmmId, profile creator, and description strings — same fingerprint surface as the JPEG case.
Privacy note: HDR and stereo chunks are kept unconditionally
cICP, mDCv, cLLi, and sTER are display-essential and non-identifying: 4-, 24-, 8-, and 1-byte payloads respectively, all encoding numeric color/luminance/viewing-mode parameters with no user-attributable strings. Dropping them silently breaks rendering on HDR-aware or stereo-aware viewers (HDR PNG: wrong gamut and tone curve; stereo PNG: cross-fused/diverging confusion) for zero privacy benefit. They live in ALWAYS_KEEP alongside tRNS and the APNG animation chunks for the same reason: dropping them silently corrupts the image without removing any identifying information.
Privacy note: ICC profile preservation
Same trade-off as JPEG. Default-off errs toward privacy; users who need accurate color reproduction across devices can opt in.
Deferred to Phase 2 (if needed)
- Granular ICC scrubbing — write back the ICC profile with the identity-revealing fields zeroed instead of all-or-nothing.
- Comparison-corpus test against
exiftool -all=on a diverse PNG fixture set (camera output, screenshot tools, Photoshop/GIMP exports, browser save-as) to expose any vendor-specific surprises. - Sub-error-codes so callers can distinguish "not a PNG" from "truncated PNG" from "corrupted CRC".