exifcleaner-web/docs/gap-analysis/png.md
obuvuyoviz26-lab bf95926c80
B8 follow-up: PNG HDR + stereo chunks, multi-IHDR guard, .apng (#58)
Five fixes prompted by post-merge review of the PNG strategy:

1. PNG 3rd Edition (W3C, 2024) HDR signaling chunks are now kept:
   - cICP — color primaries / transfer / matrix coefficients (4 bytes)
   - mDCv — mastering display chromaticity + luminance (24 bytes)
   - cLLi — content light level info (8 bytes)
   Dropping them silently breaks HDR rendering on aware viewers
   (gamut clipping, wrong tone curve) for zero privacy benefit —
   they're fixed-size numerics with no user-attributable surface.

2. sTER (stereo indicator) now kept. 1 byte indicating cross-fused
   vs diverging viewing mode. Same reasoning as tRNS / APNG: dropping
   it changes how a stereo pair is rendered with no privacy benefit
   (the stereo nature is already evident from the dimensions).

3. Duplicate IHDR is now rejected as `invalid-file-format`. Per PNG
   §5.6 IHDR appears exactly once; the first-chunk-must-be-IHDR
   guard already existed, the symmetric duplicate guard didn't.

4. The strategy now claims `.apng` in addition to `.png`. APNG files
   (registered MIME `image/apng`) frequently use the .apng extension;
   same byte structure, same chunk policy. Added to
   WASM_HANDLED_EXTENSIONS so the Electron build routes them too.

5. Top-of-file policy comment updated to match — and to drop the
   "mirrors ExifTool's `-all=` behaviour" framing, which the
   forensic test contradicted (we are strictly more aggressive).

Updated docs/gap-analysis/png.md with the new policy rows + a
section on why the new keep-list entries don't violate privacy.
docs/forensic/png.md re-verified — same 66-byte output, same zero
sentinel survival. The new chunks have no string-attributable
surface so they are forensic non-events.

Tests: +5 (3 parametrized HDR keep, 1 sTER keep, 1 duplicate-IHDR
reject). 423/423 pass.

Quality gates: `tsc --noEmit` clean, `prettier --check` clean,
forensic runner reproduces the documented zero-survival result.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:38:06 +04:00

11 KiB
Raw Permalink Blame History

PNG metadata-stripping gap analysis

Date: 2026-05-07. Updated 2026-05-08 to cover PNG 3rd Edition (W3C, 2024) HDR signaling chunks (cICP, mDCv, cLLi), the stereo indicator (sTER), the duplicate-IHDR rejection, and the .apng extension.

Goal: Document the gap between the WASM library options previously surveyed for image metadata removal and ExifTool's -all= PNG strip, and the rationale for the hand-rolled PNG chunk walker that ships in Phase B8.

Methodology

Read:

  • W3C / ISO 15948 PNG specification (2nd Ed., 2003) §11 (chunk types) and §12 (private/ancillary chunk semantics).
  • W3C PNG 3rd Edition (2024) for new chunks: cICP, mDCv, cLLi. https://www.w3.org/TR/png-3/.
  • libpng documentation on chunk handling and which ancillaries are display-affecting vs informational.
  • ExifTool documentation at https://exiftool.org/TagNames/PNG.html for the chunks the -all= strip affects.
  • The existing JpegStrategy precedent (src/infrastructure/wasm/strategies/jpeg_strategy.ts) and its gap analysis (docs/gap-analysis/jpeg.md).

Reused empirically (from POCs already written for JPEG):

  • little_exif (Rust → WASM, ~330 KB raw / 111 KB gzip) — surveyed in docs/poc/little-exif-wasm.md. Leaves PNG tEXt chunks untouched. Ruled out.
  • exiv2-wasm (~2.3 MB raw / 925 KB gzip) — surveyed in docs/poc/exiv2-wasm.md. Same writeString(buf, key, "") no-erase pattern. Ruled out.
  • @uswriting/exiftool-wasm — full Perl ExifTool compiled to WASM. ~5 MB lazy-load. Considered for the original B3 plan; replaced by per-format hand-rolled walkers because the bundle weight is unjustifiable when each strategy is ~150 lines.

Per-chunk policy

PNG file structure: 8-byte signature 89 50 4E 47 0D 0A 1A 0A followed by a sequence of chunks. Each chunk is [length:u32-be][type:4 ASCII][data:length bytes][crc:u32-be]. The CRC covers type + data. Chunk type's case carries semantics (PNG §3.3):

  • Bit 5 of byte 0 (Ancillary bit): uppercase = critical, lowercase = ancillary (decoder may skip).
  • Bit 5 of byte 1 (Private bit): uppercase = public/registered, lowercase = private/unregistered.
  • Bit 5 of byte 2 (Reserved): must be uppercase.
  • Bit 5 of byte 3 (Safe-to-copy bit): uppercase = unsafe (depends on image data), lowercase = safe.

Critical chunks define the image; dropping any of them breaks the file. Public ancillary chunks have well-defined privacy implications. Private chunks are vendor-specific and almost always carry identifying metadata.

Chunk Critical? Source of leak ExifTool -all= Phase B8 walker
IHDR yes n/a — image header keep keep
PLTE yes n/a — palette keep keep
IDAT yes n/a — image data keep keep
IEND yes n/a — end marker keep keep
tRNS no (ancillary, safe) n/a — transparency, display-affecting keep keep
tEXt no latin-1 key=value text (Author, Copyright, Software, …) drop drop
iTXt no UTF-8 international text (XMP commonly stored here) drop drop
zTXt no compressed text (same content as tEXt) drop drop
eXIf no full EXIF block embedded in PNG drop drop
tIME no last-modification timestamp drop drop
pHYs no physical pixel dimensions / aspect ratio drop drop
sPLT no suggested palette (rare, but author-set) drop drop
hIST no image histogram (encoder fingerprint) drop drop
bKGD no suggested background color drop drop
sBIT no significant bits per channel (encoder hint) drop drop
iCCP no embedded ICC color profile (creator strings, dateTime) drop drop by default; kept when preserveColorProfile: true
cHRM no primary chromaticities (color-space metadata) drop drop by default; kept when preserveColorProfile: true
gAMA no image gamma drop drop by default; kept when preserveColorProfile: true
sRGB no sRGB rendering intent drop drop by default; kept when preserveColorProfile: true
acTL, fcTL, fdAT no (animation control/data) n/a — APNG animation parameters and frame pixels, no user identity keep keep — dropping them silently turns APNG into a static PNG
cICP no (PNG 3rd Ed., HDR signaling) n/a — 4 bytes of color-primaries / transfer / matrix coefficients, no user identity keep keep — dropping causes HDR PNGs to render with wrong gamut on HDR-aware viewers
mDCv no (PNG 3rd Ed., HDR mastering display) n/a — 24 bytes of fixed-point chromaticity + luminance for the mastering display keep keep — paired with cICP for accurate HDR rendering
cLLi no (PNG 3rd Ed., HDR luminance) n/a — 8 bytes of MaxCLL / MaxFALL luminance stats for tone mapping keep keep — paired with cICP for accurate HDR rendering
sTER no (stereo indicator) n/a — 1 byte of cross-fused vs diverging viewing-mode flag keep keep — dropping silently changes how a side-by-side stereo pair is rendered, with no privacy benefit (the dimensions already disclose stereo)
Private ancillary (lowercase second byte) no vendor-specific drop drop
Unknown public ancillary no unspecified drop drop (conservative)

Critical chunks are passed through verbatim — the walker never re-encodes image data. CRCs on kept chunks remain valid because the chunk bytes are copied unmodified. The walker validates CRCs on input to reject corrupted PNGs early; it does not recompute CRCs on output (no chunks are synthesized).

Honest gap summary

Library options vs ExifTool: little_exif doesn't touch PNG text chunks; exiv2-wasm's erase semantics are too weak; @uswriting/exiftool-wasm works but is 30× the size of a hand-rolled walker.

ExifTool -all= vs Phase B8 walker: policy table is identical for every chunk listed. Differences at the edges:

  • ExifTool will rewrite a pHYs chunk if the user passes -PNG:PixelsPerUnitX; the walker only strips. Granular tag-level operations are out of scope.
  • ExifTool computes an output hash for the file; not applicable here (we're emitting raw bytes, no hash).
  • ExifTool can convert between PNG and APNG semantics; out of scope.
  • ExifTool's -all= does not currently surface PNG 3rd Edition HDR chunks (it predates the spec); on a PNG built with cICP/mDCv/cLLi, ExifTool passes them through verbatim. Walker matches that behaviour by keeping them.

Forensic verification: the original B8 forensic battery (see docs/forensic/png.md) showed that the walker is strictly more aggressive than exiftool -all= on private/unknown chunks: ExifTool keeps prVt, bKGD, and sBIT, all of which the walker drops. That divergence is intentional and remains true after the new keep-list additions — the new chunks (cICP, mDCv, cLLi, sTER) are display-essential numerics, not user-identifying strings, and contain no surface for sentinel survival.

Recommendation

Hand-rolled chunk walker. Reasoning:

  • PNG chunk structure is fully specified, simple to parse, and well-documented (~120 lines of clean TypeScript including CRC32 lookup table).
  • WASM library options were both ruled out by the same POCs that ruled them out for JPEG.
  • Zero production dependencies — ships ~111 KB less than little_exif would have cost (and ~5 MB less than WASM ExifTool).
  • We control the chunk policy directly — the policy table is auditable and there's no library defaults to fight.

Phase B8 implementation

Lives at src/infrastructure/wasm/strategies/png_strategy.ts. Key invariants:

  • Signature check: rejects anything that doesn't start with the 8-byte PNG signature 89 50 4E 47 0D 0A 1A 0A. Same verifyMagicBytes pattern as JPEG.
  • Extensions: claims .png and .apng. APNG files (registered MIME image/apng) often use the .apng extension; same byte structure, same chunk policy, no reason to reject.
  • CRC32 validation on input: every chunk's stored CRC must match a recomputed CRC over type + data. A mismatch surfaces as invalid-file-format rather than silently emitting a fixed-up file.
  • Verbatim copy of kept chunks: the walker copies chunk bytes (length + type + data + crc) byte-for-byte. No re-encoding, no CRC recomputation.
  • First chunk must be IHDR: rejected as invalid-file-format if not.
  • Exactly one IHDR: a second IHDR anywhere in the stream is rejected. Per spec, IHDR appears exactly once; tolerating duplicates would let a malformed file ship with two image headers, which downstream decoders interpret inconsistently. Symmetric with the CRC validation philosophy — fail loud on structural violations.
  • Truncation behaviour: missing or short final chunk surfaces as invalid-file-format. Specifically, if IEND is never reached the walker fails loud.
  • metadataRemoved: counts dropped ancillary chunks. A clean input that needed no changes returns 0, not 1 — callers must not treat 0 as a failure signal.
  • preserveOrientation: PNG has no orientation chunk (orientation is a TIFF/EXIF concept). The flag is silently ignored; honoring it is a no-op rather than an error.
  • preserveColorProfile: true: keeps iCCP, cHRM, gAMA, sRGB. ICC profiles include cmmId, profile creator, and description strings — same fingerprint surface as the JPEG case.

Privacy note: HDR and stereo chunks are kept unconditionally

cICP, mDCv, cLLi, and sTER are display-essential and non-identifying: 4-, 24-, 8-, and 1-byte payloads respectively, all encoding numeric color/luminance/viewing-mode parameters with no user-attributable strings. Dropping them silently breaks rendering on HDR-aware or stereo-aware viewers (HDR PNG: wrong gamut and tone curve; stereo PNG: cross-fused/diverging confusion) for zero privacy benefit. They live in ALWAYS_KEEP alongside tRNS and the APNG animation chunks for the same reason: dropping them silently corrupts the image without removing any identifying information.

Privacy note: ICC profile preservation

Same trade-off as JPEG. Default-off errs toward privacy; users who need accurate color reproduction across devices can opt in.

Deferred to Phase 2 (if needed)

  • Granular ICC scrubbing — write back the ICC profile with the identity-revealing fields zeroed instead of all-or-nothing.
  • Comparison-corpus test against exiftool -all= on a diverse PNG fixture set (camera output, screenshot tools, Photoshop/GIMP exports, browser save-as) to expose any vendor-specific surprises.
  • Sub-error-codes so callers can distinguish "not a PNG" from "truncated PNG" from "corrupted CRC".