9.7 KiB
JPEG forensic recovery test
Date: 2026-05-07 (mat2 comparison column added 2026-05-17)
Goal: Verify that metadata stripped by JpegStrategy cannot be recovered by an attacker with standard JPEG forensic tooling. Cover both the default preserveOrientation: false strip and the preserveOrientation: true synthesized-APP1 path. Compare against exiftool -all= as the canonical reference and against mat2 as a comparable FOSS privacy tool.
Reproducible at: tools/forensic/jpeg.ts — npx tsx tools/forensic/jpeg.ts from the project root.
Methodology
The runner generates a baseline JPEG via Python/Pillow (Image.new('RGB',(8,8),'white').save(…,'JPEG')), then injects 10 unique sentinel strings plus the binary Orientation tag via exiftool in a single rewrite. Each sentinel is a 26–30-character ASCII string with a unique tail (e.g. FORENSIC-EXIF-ARTIST-DDDD4444) so any survivor can be unambiguously attributed to its source segment.
Why Pillow instead of the test fixture: the committed tests/fixtures/wasm/images/sample.jpg is deliberately minimal and lacks the SOF/DQT/DHT structure required by GdkPixbuf (mat2's image backend). A Pillow-generated JPEG is structurally complete and usable by all three strip tools.
Sources covered:
| Sentinel | Where it lives | Marker |
|---|---|---|
EXIF_MAKE |
EXIF IFD0 Make (0x010F) |
APP1 EXIF |
EXIF_MODEL |
EXIF IFD0 Model (0x0110) |
APP1 EXIF |
EXIF_SOFTWARE |
EXIF IFD0 Software (0x0131) |
APP1 EXIF |
EXIF_ARTIST |
EXIF IFD0 Artist (0x013B) |
APP1 EXIF |
EXIF_COPYRIGHT |
EXIF IFD0 Copyright (0x8298) |
APP1 EXIF |
EXIF_USERCOMMENT |
EXIF SubIFD UserComment (0x9286) |
APP1 EXIF |
XMP_CREATOR |
XMP dc:creator |
APP1 XMP |
XMP_TITLE |
XMP dc:title |
APP1 XMP |
IPTC_BYLINE |
IPTC By-line |
APP13 (Photoshop / 8BIM) |
JPEG_COMMENT |
JPEG comment string | COM (0xFFFE) |
Plus the binary Orientation tag (0x0112) injected at value 6 (Rotate 90 CW). This is tracked separately because the value is too short to carry a string sentinel, and the policy specifically permits its survival under preserveOrientation: true.
The fixture is then stripped four ways:
JpegStrategydefault —preserveOrientation: false,preserveColorProfile: falseJpegStrategywith orientation kept —preserveOrientation: true,preserveColorProfile: falseexiftool -all= -overwrite_original— the canonical referencemat2— FOSS privacy-tool reference (Debian/Ubuntu:sudo apt install mat2); skipped if absent
For each output, the runner applies three recovery techniques:
- Raw
strings— finds sentinels left in unencoded form anywhere in the file exiftool -a -G1 -s— every visible metadata tag including hidden namespaces (EXIF, XMP, IPTC, ICC, MakerNotes)- In-process marker walker — re-parses the JPEG, extracts every APP*/COM segment payload, scans each as latin-1 for sentinels. Catches sentinels in segments the visible-tag scan might not surface (unusual MakerNote layouts, non-standard APP variants, malformed but parseable structures)
Plus structural checks: list of remaining APP*/COM segments by marker code and size; whether the literal string ExifTool appears anywhere in the output bytes; and the Orientation tag value as reported by exiftool -Orientation#.
Comparison reference: mat2
mat2 (Metadata Anonymisation Toolkit 2) is the privacy-tool reference used by Tails OS. For JPEG it uses GdkPixbuf to re-encode the image rather than walking and filtering JPEG markers. This is a fundamentally different approach to our hand-rolled segment walker: GdkPixbuf reads the pixel data, re-encodes it as JPEG via libjpeg-turbo, and emits the result with a freshly generated JFIF APP0 — all original metadata segments are gone because none of them are read by the pixel-level decode path.
mat2 has no --preserve-orientation flag. Its JPEG strip is always full-strip; the comparison is only meaningful against our preserveOrientation: false output.
Results
| Input fixture | JpegStrategy (default) | JpegStrategy (preserveOrientation) | ExifTool -all= |
mat2 0.13.4 | |
|---|---|---|---|---|---|
| Output size | 4 177 bytes | 613 bytes | 649 bytes | 613 bytes | 631 bytes |
| APP*/COM segments remaining | 5 (APP0, APP1×2, APP13, COM) | none | 1 (APP1 EXIF, 34 b) | none | 1 (APP0, 16 b) |
| Orientation tag | 6 |
absent | 6 |
absent | absent |
ExifTool marker in bytes |
yes | no | no | no | no |
Raw strings sentinels |
10 | 0 | 0 | 0 | 0 |
| ExifTool visible tags | 10 | 0 | 0 | 0 | 0 |
| In-process marker walk | 10 | 0 | 0 | 0 | 0 |
Per-sentinel comparison: JpegStrategy (default) vs mat2
| sentinel | raw | JpegStrategy | mat2 |
|---|---|---|---|
| EXIF_MAKE | present | removed | removed |
| EXIF_MODEL | present | removed | removed |
| EXIF_SOFTWARE | present | removed | removed |
| EXIF_ARTIST | present | removed | removed |
| EXIF_COPYRIGHT | present | removed | removed |
| EXIF_USERCOMMENT | present | removed | removed |
| XMP_CREATOR | present | removed | removed |
| XMP_TITLE | present | removed | removed |
| IPTC_BYLINE | present | removed | removed |
| JPEG_COMMENT | present | removed | removed |
Leaked by both: 0. Leaked by us only: 0. Leaked by mat2 only: 0.
Interpretation
For the default strip, JpegStrategy and exiftool -all= are byte-for-byte equivalent on this fixture — both produce 613-byte outputs with zero APP*/COM segments and zero sentinel survival. The hand-rolled walker exactly matches the reference implementation's privacy guarantee.
JpegStrategy and mat2 both achieve zero sentinel survival — no sentinel string is recoverable from either output via any of the three recovery channels. The two tools differ in approach but not in outcome on this fixture:
JpegStrategydrops APP0/JFIF (matching ExifTool policy), leaving no APP/COM segments*.- mat2 re-encodes via GdkPixbuf and emits a freshly generated JFIF APP0 (18 bytes total: marker + length + standard
JFIF\0header). The APP0 carries no sentinel data and is not privacy-relevant — it contains only JFIF version, density units, and pixel density, all generated by libjpeg-turbo at encode time.
mat2 orientation behaviour: mat2 has no orientation-preservation flag; it drops orientation unconditionally. On this fixture its output reports Orientation: absent, same as our preserveOrientation: false mode. The preserveOrientation: true path (synthesizing a minimal APP1 with only the Orientation IFD0 entry) is an exclusive feature of JpegStrategy.
preserveOrientation: true is privacy-sound — the synthesized APP1 carries the orientation value and nothing else. See the original interpretation for the full structural argument; the mat2 run confirms the same conclusion from a different angle: even a tool that re-encodes the image from scratch would need to explicitly add an orientation tag to match our output.
One structural asymmetry vs mat2: mat2's GdkPixbuf re-encode adds a fresh JFIF APP0. This is not a privacy gap (no original metadata survives), but it is a fingerprint that the file was processed by GdkPixbuf. JpegStrategy leaves no such marker: the output is identical in structure to a JPEG that was never stamped with any metadata in the first place.
Caveats and limits of this test
- The fixture exercises 10 metadata sources but does not exercise: APP2 (ICC profiles — would require a synthesized profile file; tested separately by the unit-test suite via
preserveColorProfile), APP14 (Adobe DCT — kept byJpegStrategyby design; mat2's GdkPixbuf re-encode drops it, which is a meaningful policy difference not captured by this sentinel battery since APP14 carries no string data), APP3..APP12 / APP15 (rare in real-world output; the walker drops them by the same code path). - MakerNotes are not tested via this sentinel battery — exiftool refused to inject a custom MakerNote on the Pillow fixture without a recognised camera Make. The walker drops MakerNote-bearing APP1 EXIF segments entirely; the unit tests exercise this path separately.
- mat2's GdkPixbuf backend re-encodes the image at its own quality setting, potentially degrading the image. The sentinel battery only tests metadata recoverability, not image fidelity. For a privacy tool, the recoverability result is the relevant measure.
- mat2 has no orientation-preservation mode, so the
preserveOrientation: truecomparison isJpegStrategy-exclusive. - The runner skips mat2 cleanly if it is not installed. All existing recovery channels (
JpegStrategy, exiftool) still run. The mat2 column shows(skipped)in the comparison table. - This test is reproducible but not in CI yet. A natural follow-up is wiring
tools/forensic/jpeg.ts(and the other forensic runners) into a release-gate test.
Reproducing
# Prerequisites
sudo apt install mat2 # optional — adds the comparison column
# exiftool and python3 (with Pillow) are required
# From the project root
npx tsx tools/forensic/jpeg.ts
Outputs go to /tmp/jpeg-forensic/:
input.jpg— the rich fixture (Pillow baseline + all sentinels injected)our-default.jpg—JpegStrategyoutput, default optionsour-preserve-orientation.jpg—JpegStrategyoutput,preserveOrientation: trueexiftool-stripped.jpg—exiftool -all=outputmat2-stripped.jpg— mat2 output (if mat2 installed)report.json— structured per-output sentinel-survival data
Required tools: exiftool, python3 (with Pillow), strings. All available on Debian/Ubuntu via apt (libimage-exiftool-perl, python3-pil, binutils). mat2 optional (sudo apt install mat2).