10 KiB
PNG forensic recovery test
Date: 2026-05-07. Re-verified 2026-05-08 after the keep-list expansion to cover PNG 3rd Edition HDR chunks (cICP, mDCv, cLLi) and the stereo indicator (sTER); zero-survival result unchanged. mat2 comparison column added 2026-05-17.
Goal: Verify that metadata stripped by PngStrategy cannot be recovered by an attacker with standard PNG forensic tooling. Compare against exiftool -all= as a reference point and against mat2 as a comparable FOSS privacy tool.
Reproducible at: tools/forensic/png.ts — npx tsx tools/forensic/png.ts from the project root.
Methodology
The runner generates a synthetic PNG fixture with 10 unique sentinel strings embedded across every metadata source the gap analysis identified. Each sentinel is a 28-character ASCII string with a unique tail (e.g. FORENSIC-EXIF-ARTIST-FFFF6666) so any survivor can be unambiguously attributed to its source.
Note on IDAT: the IDAT payload was updated from a 9-byte hand-crafted zlib stream to the 13-byte stream emitted by Pillow for a 1×1 RGBA image. The original sequence produced "invalid stored block lengths" under GdkPixbuf (mat2's PNG backend), which requires a fully valid deflate stream. The 4-byte increase propagates to all output sizes; all sentinel-survival results are unchanged.
Sources covered:
| Sentinel | Where it lives | How it was injected |
|---|---|---|
TEXT_AUTHOR |
tEXt chunk, keyword Author |
hand-built chunk |
TEXT_COPYRIGHT |
tEXt chunk, keyword Copyright |
hand-built chunk |
TEXT_SOFTWARE |
tEXt chunk, keyword Software |
hand-built chunk |
ITXT_XMP |
iTXt chunk, keyword XML:com.adobe.xmp, dc:creator |
hand-built XMP packet |
ZTXT_COMMENT |
zTXt chunk, keyword Comment (deflated) |
hand-built chunk |
EXIF_ARTIST |
eXIf chunk, TIFF tag Artist (0x013B) |
hand-built TIFF block |
EXIF_COPYRIGHT |
eXIf chunk, TIFF tag Copyright (0x8298) |
hand-built TIFF block |
ICCP_NAME |
iCCP chunk, latin-1 profile name |
hand-built chunk |
ICCP_BODY |
iCCP chunk, deflated profile body |
hand-built chunk |
PRIVATE_CHUNK |
private ancillary chunk prVt |
hand-built chunk |
Plus four binary chunks tracked by presence rather than sentinel string (timestamps and resolution fields are too short to carry attributable strings, but the chunk type alone is an encoder/capture-time fingerprint): tIME, pHYs, bKGD, sBIT.
The fixture is stripped three ways:
PngStrategy— the hand-rolled chunk walker, default options (preserveColorProfile: false)exiftool -all= -overwrite_original— the canonical referencemat2— FOSS privacy-tool reference (Debian/Ubuntu:sudo apt install mat2); skipped if absent
For each output, the runner applies three recovery techniques:
- Raw
strings— finds sentinels left in unencoded form anywhere in the file exiftool -a -G1 -s— every visible metadata tag including hidden namespaces- In-process chunk walk + zlib inflate — re-parses the PNG, walks every chunk, scans the chunk payload as latin-1, and searches for zlib-compressed streams (header
0x78 0x9C/0x78 0xDA/0x78 0x01) and inflates them. This catches sentinels hidden inside compressedzTXt,iTXt-compressed, oriCCPpayloads that the other two techniques miss.
Plus structural checks: list of remaining chunk types, binary-chunk presence, and a scan for stray exiftool/exifcleaner strings in the output bytes.
Comparison reference: mat2
mat2 (Metadata Anonymisation Toolkit 2) is the privacy-tool reference used by Tails OS. For PNG it uses GdkPixbuf to re-encode the image, which drops all ancillary chunks and re-emits the pixel data via libpng. This means the output PNG contains only what libpng emits during a fresh encode — all original metadata chunks are gone because none of them are read by the pixel-level decode path.
Results
| Input fixture | PngStrategy | ExifTool -all= |
mat2 0.13.4 | |
|---|---|---|---|---|
| Output size | 920 bytes | 70 bytes | 147 bytes | 86 bytes |
| Chunks remaining | 15 | 3 (IHDR, IDAT, IEND) | 6 (IHDR, sBIT, bKGD, prVt, IDAT, IEND) | 4 (IHDR, bKGD, IDAT, IEND) |
Binary chunks present (of tIME, pHYs, bKGD, sBIT) |
4 / 4 | 0 / 4 | 2 / 4 (bKGD, sBIT) |
1 / 4 (bKGD) |
Raw strings sentinels |
8 | 0 | 1 (PRIVATE_CHUNK) |
0 |
| ExifTool visible tags | 8 | 0 | 0 | 0 |
| Walk + decompress | 10 | 0 | 1 (PRIVATE_CHUNK) |
0 |
Stray exiftool/exifcleaner markers |
0 | 0 | 0 | 0 |
PngStrategy — every recovery check returns []. Output is 70 bytes; only the three critical chunks remain. No sentinels recoverable by any technique, no binary fingerprint chunks, no stray markers.
exiftool -all=:
PRIVATE_CHUNKsurvives every recovery technique: ExifTool does not drop unknown private ancillary chunks (theprVtchunk is copied through verbatim, sentinel and all).bKGDandsBITchunks survive: ExifTool keeps these encoder hints.- All ten string-bearing sentinels are correctly removed.
mat2:
- Zero sentinel survival across all three recovery channels.
bKGD(suggested background color) survives: mat2's libpng encoder preserves the background-color hint from the input during re-encoding.sBITandprVtare gone — the re-encode doesn't propagate them.
Per-sentinel comparison: PngStrategy vs mat2
| sentinel | raw | PngStrategy | mat2 |
|---|---|---|---|
| TEXT_AUTHOR | present | removed | removed |
| TEXT_COPYRIGHT | present | removed | removed |
| TEXT_SOFTWARE | present | removed | removed |
| ITXT_XMP | present | removed | removed |
| ZTXT_COMMENT | present | removed | removed |
| EXIF_ARTIST | present | removed | removed |
| EXIF_COPYRIGHT | present | removed | removed |
| ICCP_NAME | present | removed | removed |
| ICCP_BODY | present | removed | removed |
| PRIVATE_CHUNK | present | removed | removed |
Leaked by both: 0. Leaked by us only: 0. Leaked by mat2 only: 0.
Interpretation
PngStrategy is strictly more aggressive than exiftool -all= for PNG. This is a meaningful finding: ExifTool uses conservative whitelist semantics for named chunks and leaves anything outside that list intact. Our walker uses a small keep-list (IHDR, PLTE, IDAT, IEND, tRNS, acTL, fcTL, fdAT, cICP, mDCv, cLLi, sTER) and drops everything else.
PngStrategy and mat2 both achieve zero sentinel survival. The two tools differ in approach and in structural output but not in privacy outcome on this fixture:
PngStrategyproduces the most minimal output: IHDR + IDAT + IEND only (70 bytes). No ancillary chunks survive at all.- mat2 re-encodes via GdkPixbuf/libpng and emits IHDR + bKGD + IDAT + IEND (86 bytes). The
bKGDchunk is freshly generated by libpng during re-encode from the input's background-color hint — it carries no sentinel data and is low-risk (it reveals only an authoring-tool background-color preference, not a user identity).
mat2 is more aggressive than ExifTool on private chunks. The prVt private ancillary chunk (which carries PRIVATE_CHUNK verbatim) survives exiftool -all= but is completely absent from mat2's output. This is because mat2's re-encoding approach drops every chunk that GdkPixbuf doesn't pass through — no explicit knowledge of the prVt chunk type is required. PngStrategy's keep-list approach reaches the same result: prVt is not on the keep-list, so it is dropped.
eXIf chunk handling: both PngStrategy and mat2 drop the eXIf chunk (full EXIF block embedded in PNG). ExifTool also drops it via -all=. All three tools agree on this surface.
iCCP (ICC profile) handling: all three tools drop the iCCP chunk when run with default options. PngStrategy keeps it on opt-in via preserveColorProfile: true; mat2 has no equivalent opt-in (its re-encode never preserves ICC). ExifTool's -all= drops it and emits "ICC_Profile deleted. Image colors may be affected."
ICCP_BODY note: the ICCP_BODY sentinel is inside a zlib-compressed payload and does not appear in the raw strings scan of the input — it is only detected by the in-process walk + inflate channel. Both PngStrategy and mat2 remove it cleanly.
Note on the expanded keep-list
The keep-list addition (cICP, mDCv, cLLi, sTER) does not affect this test's results. Those chunks carry only fixed-size numeric payloads and have no string-attributable surface. The unit suite (tests/infrastructure/wasm/png_strategy.test.ts) verifies they are kept verbatim.
Caveats and limits of this test
- The fixture is synthetic. Real-world PNGs from cameras, screenshot tools, and Photoshop/GIMP exports have richer chunk profiles that this fixture doesn't exercise. Our strategy drops every non-keep-listed chunk regardless, so the result should still be zero survival.
- APNG-specific behaviour (acTL, fcTL, fdAT chunks) is not exercised in the forensic fixture. Those chunks are tested in the unit test suite.
- The runner only exercises default options (
preserveColorProfile: false). WithpreserveColorProfile: true, the iCCP/cHRM/gAMA/sRGB chunks are kept by design. - mat2 has no orientation-preservation or colour-preservation flags; it always performs a full re-encode.
- The runner skips mat2 cleanly if it is not installed. All existing recovery channels still run.
- This test is reproducible but not in CI yet.
Reproducing
# Prerequisites (mat2 optional — adds the comparison column)
sudo apt install mat2
# From the project root
npx tsx tools/forensic/png.ts
Outputs go to /tmp/png-forensic/:
input.png— the rich fixtureour-stripped.png—PngStrategyoutputexiftool-stripped.png—exiftool -all=outputmat2-stripped.png— mat2 output (if mat2 installed)report.json— structured per-output sentinel-survival data
Required tools: exiftool, strings. Both available on Debian/Ubuntu via apt (libimage-exiftool-perl, binutils). mat2 optional (sudo apt install mat2).