exifcleaner-web/docs/forensic/png.md
forgejo_admin 204c49cb76
All checks were successful
CI / Lint, Typecheck & Unit Tests (push) Successful in 28s
CI / E2E (Standalone single-file) (push) Successful in 1m12s
CI / E2E (Web) (push) Successful in 2m16s
test(png-forensic): add mat2 comparative column (#133) (#157)
2026-05-17 16:12:06 +04:00

10 KiB
Raw Permalink Blame History

PNG forensic recovery test

Date: 2026-05-07. Re-verified 2026-05-08 after the keep-list expansion to cover PNG 3rd Edition HDR chunks (cICP, mDCv, cLLi) and the stereo indicator (sTER); zero-survival result unchanged. mat2 comparison column added 2026-05-17.

Goal: Verify that metadata stripped by PngStrategy cannot be recovered by an attacker with standard PNG forensic tooling. Compare against exiftool -all= as a reference point and against mat2 as a comparable FOSS privacy tool.

Reproducible at: tools/forensic/png.tsnpx tsx tools/forensic/png.ts from the project root.

Methodology

The runner generates a synthetic PNG fixture with 10 unique sentinel strings embedded across every metadata source the gap analysis identified. Each sentinel is a 28-character ASCII string with a unique tail (e.g. FORENSIC-EXIF-ARTIST-FFFF6666) so any survivor can be unambiguously attributed to its source.

Note on IDAT: the IDAT payload was updated from a 9-byte hand-crafted zlib stream to the 13-byte stream emitted by Pillow for a 1×1 RGBA image. The original sequence produced "invalid stored block lengths" under GdkPixbuf (mat2's PNG backend), which requires a fully valid deflate stream. The 4-byte increase propagates to all output sizes; all sentinel-survival results are unchanged.

Sources covered:

Sentinel Where it lives How it was injected
TEXT_AUTHOR tEXt chunk, keyword Author hand-built chunk
TEXT_COPYRIGHT tEXt chunk, keyword Copyright hand-built chunk
TEXT_SOFTWARE tEXt chunk, keyword Software hand-built chunk
ITXT_XMP iTXt chunk, keyword XML:com.adobe.xmp, dc:creator hand-built XMP packet
ZTXT_COMMENT zTXt chunk, keyword Comment (deflated) hand-built chunk
EXIF_ARTIST eXIf chunk, TIFF tag Artist (0x013B) hand-built TIFF block
EXIF_COPYRIGHT eXIf chunk, TIFF tag Copyright (0x8298) hand-built TIFF block
ICCP_NAME iCCP chunk, latin-1 profile name hand-built chunk
ICCP_BODY iCCP chunk, deflated profile body hand-built chunk
PRIVATE_CHUNK private ancillary chunk prVt hand-built chunk

Plus four binary chunks tracked by presence rather than sentinel string (timestamps and resolution fields are too short to carry attributable strings, but the chunk type alone is an encoder/capture-time fingerprint): tIME, pHYs, bKGD, sBIT.

The fixture is stripped three ways:

  1. PngStrategy — the hand-rolled chunk walker, default options (preserveColorProfile: false)
  2. exiftool -all= -overwrite_original — the canonical reference
  3. mat2 — FOSS privacy-tool reference (Debian/Ubuntu: sudo apt install mat2); skipped if absent

For each output, the runner applies three recovery techniques:

  1. Raw strings — finds sentinels left in unencoded form anywhere in the file
  2. exiftool -a -G1 -s — every visible metadata tag including hidden namespaces
  3. In-process chunk walk + zlib inflate — re-parses the PNG, walks every chunk, scans the chunk payload as latin-1, and searches for zlib-compressed streams (header 0x78 0x9C / 0x78 0xDA / 0x78 0x01) and inflates them. This catches sentinels hidden inside compressed zTXt, iTXt-compressed, or iCCP payloads that the other two techniques miss.

Plus structural checks: list of remaining chunk types, binary-chunk presence, and a scan for stray exiftool/exifcleaner strings in the output bytes.

Comparison reference: mat2

mat2 (Metadata Anonymisation Toolkit 2) is the privacy-tool reference used by Tails OS. For PNG it uses GdkPixbuf to re-encode the image, which drops all ancillary chunks and re-emits the pixel data via libpng. This means the output PNG contains only what libpng emits during a fresh encode — all original metadata chunks are gone because none of them are read by the pixel-level decode path.

Results

Input fixture PngStrategy ExifTool -all= mat2 0.13.4
Output size 920 bytes 70 bytes 147 bytes 86 bytes
Chunks remaining 15 3 (IHDR, IDAT, IEND) 6 (IHDR, sBIT, bKGD, prVt, IDAT, IEND) 4 (IHDR, bKGD, IDAT, IEND)
Binary chunks present (of tIME, pHYs, bKGD, sBIT) 4 / 4 0 / 4 2 / 4 (bKGD, sBIT) 1 / 4 (bKGD)
Raw strings sentinels 8 0 1 (PRIVATE_CHUNK) 0
ExifTool visible tags 8 0 0 0
Walk + decompress 10 0 1 (PRIVATE_CHUNK) 0
Stray exiftool/exifcleaner markers 0 0 0 0

PngStrategy — every recovery check returns []. Output is 70 bytes; only the three critical chunks remain. No sentinels recoverable by any technique, no binary fingerprint chunks, no stray markers.

exiftool -all=:

  • PRIVATE_CHUNK survives every recovery technique: ExifTool does not drop unknown private ancillary chunks (the prVt chunk is copied through verbatim, sentinel and all).
  • bKGD and sBIT chunks survive: ExifTool keeps these encoder hints.
  • All ten string-bearing sentinels are correctly removed.

mat2:

  • Zero sentinel survival across all three recovery channels.
  • bKGD (suggested background color) survives: mat2's libpng encoder preserves the background-color hint from the input during re-encoding.
  • sBIT and prVt are gone — the re-encode doesn't propagate them.

Per-sentinel comparison: PngStrategy vs mat2

sentinel raw PngStrategy mat2
TEXT_AUTHOR present removed removed
TEXT_COPYRIGHT present removed removed
TEXT_SOFTWARE present removed removed
ITXT_XMP present removed removed
ZTXT_COMMENT present removed removed
EXIF_ARTIST present removed removed
EXIF_COPYRIGHT present removed removed
ICCP_NAME present removed removed
ICCP_BODY present removed removed
PRIVATE_CHUNK present removed removed

Leaked by both: 0. Leaked by us only: 0. Leaked by mat2 only: 0.

Interpretation

PngStrategy is strictly more aggressive than exiftool -all= for PNG. This is a meaningful finding: ExifTool uses conservative whitelist semantics for named chunks and leaves anything outside that list intact. Our walker uses a small keep-list (IHDR, PLTE, IDAT, IEND, tRNS, acTL, fcTL, fdAT, cICP, mDCv, cLLi, sTER) and drops everything else.

PngStrategy and mat2 both achieve zero sentinel survival. The two tools differ in approach and in structural output but not in privacy outcome on this fixture:

  • PngStrategy produces the most minimal output: IHDR + IDAT + IEND only (70 bytes). No ancillary chunks survive at all.
  • mat2 re-encodes via GdkPixbuf/libpng and emits IHDR + bKGD + IDAT + IEND (86 bytes). The bKGD chunk is freshly generated by libpng during re-encode from the input's background-color hint — it carries no sentinel data and is low-risk (it reveals only an authoring-tool background-color preference, not a user identity).

mat2 is more aggressive than ExifTool on private chunks. The prVt private ancillary chunk (which carries PRIVATE_CHUNK verbatim) survives exiftool -all= but is completely absent from mat2's output. This is because mat2's re-encoding approach drops every chunk that GdkPixbuf doesn't pass through — no explicit knowledge of the prVt chunk type is required. PngStrategy's keep-list approach reaches the same result: prVt is not on the keep-list, so it is dropped.

eXIf chunk handling: both PngStrategy and mat2 drop the eXIf chunk (full EXIF block embedded in PNG). ExifTool also drops it via -all=. All three tools agree on this surface.

iCCP (ICC profile) handling: all three tools drop the iCCP chunk when run with default options. PngStrategy keeps it on opt-in via preserveColorProfile: true; mat2 has no equivalent opt-in (its re-encode never preserves ICC). ExifTool's -all= drops it and emits "ICC_Profile deleted. Image colors may be affected."

ICCP_BODY note: the ICCP_BODY sentinel is inside a zlib-compressed payload and does not appear in the raw strings scan of the input — it is only detected by the in-process walk + inflate channel. Both PngStrategy and mat2 remove it cleanly.

Note on the expanded keep-list

The keep-list addition (cICP, mDCv, cLLi, sTER) does not affect this test's results. Those chunks carry only fixed-size numeric payloads and have no string-attributable surface. The unit suite (tests/infrastructure/wasm/png_strategy.test.ts) verifies they are kept verbatim.

Caveats and limits of this test

  • The fixture is synthetic. Real-world PNGs from cameras, screenshot tools, and Photoshop/GIMP exports have richer chunk profiles that this fixture doesn't exercise. Our strategy drops every non-keep-listed chunk regardless, so the result should still be zero survival.
  • APNG-specific behaviour (acTL, fcTL, fdAT chunks) is not exercised in the forensic fixture. Those chunks are tested in the unit test suite.
  • The runner only exercises default options (preserveColorProfile: false). With preserveColorProfile: true, the iCCP/cHRM/gAMA/sRGB chunks are kept by design.
  • mat2 has no orientation-preservation or colour-preservation flags; it always performs a full re-encode.
  • The runner skips mat2 cleanly if it is not installed. All existing recovery channels still run.
  • This test is reproducible but not in CI yet.

Reproducing

# Prerequisites (mat2 optional — adds the comparison column)
sudo apt install mat2

# From the project root
npx tsx tools/forensic/png.ts

Outputs go to /tmp/png-forensic/:

  • input.png — the rich fixture
  • our-stripped.pngPngStrategy output
  • exiftool-stripped.pngexiftool -all= output
  • mat2-stripped.png — mat2 output (if mat2 installed)
  • report.json — structured per-output sentinel-survival data

Required tools: exiftool, strings. Both available on Debian/Ubuntu via apt (libimage-exiftool-perl, binutils). mat2 optional (sudo apt install mat2).