exifcleaner-web/docs/forensic/jpeg.md
forgejo_admin 9400371f78
All checks were successful
CI / Lint, Typecheck & Unit Tests (push) Successful in 27s
CI / E2E (Standalone single-file) (push) Successful in 1m16s
CI / E2E (Web) (push) Successful in 2m20s
test(jpeg-forensic): add mat2 comparative column (#132) (#155)
2026-05-17 16:05:06 +04:00

9.7 KiB
Raw Permalink Blame History

JPEG forensic recovery test

Date: 2026-05-07 (mat2 comparison column added 2026-05-17) Goal: Verify that metadata stripped by JpegStrategy cannot be recovered by an attacker with standard JPEG forensic tooling. Cover both the default preserveOrientation: false strip and the preserveOrientation: true synthesized-APP1 path. Compare against exiftool -all= as the canonical reference and against mat2 as a comparable FOSS privacy tool.

Reproducible at: tools/forensic/jpeg.tsnpx tsx tools/forensic/jpeg.ts from the project root.

Methodology

The runner generates a baseline JPEG via Python/Pillow (Image.new('RGB',(8,8),'white').save(…,'JPEG')), then injects 10 unique sentinel strings plus the binary Orientation tag via exiftool in a single rewrite. Each sentinel is a 2630-character ASCII string with a unique tail (e.g. FORENSIC-EXIF-ARTIST-DDDD4444) so any survivor can be unambiguously attributed to its source segment.

Why Pillow instead of the test fixture: the committed tests/fixtures/wasm/images/sample.jpg is deliberately minimal and lacks the SOF/DQT/DHT structure required by GdkPixbuf (mat2's image backend). A Pillow-generated JPEG is structurally complete and usable by all three strip tools.

Sources covered:

Sentinel Where it lives Marker
EXIF_MAKE EXIF IFD0 Make (0x010F) APP1 EXIF
EXIF_MODEL EXIF IFD0 Model (0x0110) APP1 EXIF
EXIF_SOFTWARE EXIF IFD0 Software (0x0131) APP1 EXIF
EXIF_ARTIST EXIF IFD0 Artist (0x013B) APP1 EXIF
EXIF_COPYRIGHT EXIF IFD0 Copyright (0x8298) APP1 EXIF
EXIF_USERCOMMENT EXIF SubIFD UserComment (0x9286) APP1 EXIF
XMP_CREATOR XMP dc:creator APP1 XMP
XMP_TITLE XMP dc:title APP1 XMP
IPTC_BYLINE IPTC By-line APP13 (Photoshop / 8BIM)
JPEG_COMMENT JPEG comment string COM (0xFFFE)

Plus the binary Orientation tag (0x0112) injected at value 6 (Rotate 90 CW). This is tracked separately because the value is too short to carry a string sentinel, and the policy specifically permits its survival under preserveOrientation: true.

The fixture is then stripped four ways:

  1. JpegStrategy defaultpreserveOrientation: false, preserveColorProfile: false
  2. JpegStrategy with orientation keptpreserveOrientation: true, preserveColorProfile: false
  3. exiftool -all= -overwrite_original — the canonical reference
  4. mat2 — FOSS privacy-tool reference (Debian/Ubuntu: sudo apt install mat2); skipped if absent

For each output, the runner applies three recovery techniques:

  1. Raw strings — finds sentinels left in unencoded form anywhere in the file
  2. exiftool -a -G1 -s — every visible metadata tag including hidden namespaces (EXIF, XMP, IPTC, ICC, MakerNotes)
  3. In-process marker walker — re-parses the JPEG, extracts every APP*/COM segment payload, scans each as latin-1 for sentinels. Catches sentinels in segments the visible-tag scan might not surface (unusual MakerNote layouts, non-standard APP variants, malformed but parseable structures)

Plus structural checks: list of remaining APP*/COM segments by marker code and size; whether the literal string ExifTool appears anywhere in the output bytes; and the Orientation tag value as reported by exiftool -Orientation#.

Comparison reference: mat2

mat2 (Metadata Anonymisation Toolkit 2) is the privacy-tool reference used by Tails OS. For JPEG it uses GdkPixbuf to re-encode the image rather than walking and filtering JPEG markers. This is a fundamentally different approach to our hand-rolled segment walker: GdkPixbuf reads the pixel data, re-encodes it as JPEG via libjpeg-turbo, and emits the result with a freshly generated JFIF APP0 — all original metadata segments are gone because none of them are read by the pixel-level decode path.

mat2 has no --preserve-orientation flag. Its JPEG strip is always full-strip; the comparison is only meaningful against our preserveOrientation: false output.

Results

Input fixture JpegStrategy (default) JpegStrategy (preserveOrientation) ExifTool -all= mat2 0.13.4
Output size 4 177 bytes 613 bytes 649 bytes 613 bytes 631 bytes
APP*/COM segments remaining 5 (APP0, APP1×2, APP13, COM) none 1 (APP1 EXIF, 34 b) none 1 (APP0, 16 b)
Orientation tag 6 absent 6 absent absent
ExifTool marker in bytes yes no no no no
Raw strings sentinels 10 0 0 0 0
ExifTool visible tags 10 0 0 0 0
In-process marker walk 10 0 0 0 0

Per-sentinel comparison: JpegStrategy (default) vs mat2

sentinel raw JpegStrategy mat2
EXIF_MAKE present removed removed
EXIF_MODEL present removed removed
EXIF_SOFTWARE present removed removed
EXIF_ARTIST present removed removed
EXIF_COPYRIGHT present removed removed
EXIF_USERCOMMENT present removed removed
XMP_CREATOR present removed removed
XMP_TITLE present removed removed
IPTC_BYLINE present removed removed
JPEG_COMMENT present removed removed

Leaked by both: 0. Leaked by us only: 0. Leaked by mat2 only: 0.

Interpretation

For the default strip, JpegStrategy and exiftool -all= are byte-for-byte equivalent on this fixture — both produce 613-byte outputs with zero APP*/COM segments and zero sentinel survival. The hand-rolled walker exactly matches the reference implementation's privacy guarantee.

JpegStrategy and mat2 both achieve zero sentinel survival — no sentinel string is recoverable from either output via any of the three recovery channels. The two tools differ in approach but not in outcome on this fixture:

  • JpegStrategy drops APP0/JFIF (matching ExifTool policy), leaving no APP/COM segments*.
  • mat2 re-encodes via GdkPixbuf and emits a freshly generated JFIF APP0 (18 bytes total: marker + length + standard JFIF\0 header). The APP0 carries no sentinel data and is not privacy-relevant — it contains only JFIF version, density units, and pixel density, all generated by libjpeg-turbo at encode time.

mat2 orientation behaviour: mat2 has no orientation-preservation flag; it drops orientation unconditionally. On this fixture its output reports Orientation: absent, same as our preserveOrientation: false mode. The preserveOrientation: true path (synthesizing a minimal APP1 with only the Orientation IFD0 entry) is an exclusive feature of JpegStrategy.

preserveOrientation: true is privacy-sound — the synthesized APP1 carries the orientation value and nothing else. See the original interpretation for the full structural argument; the mat2 run confirms the same conclusion from a different angle: even a tool that re-encodes the image from scratch would need to explicitly add an orientation tag to match our output.

One structural asymmetry vs mat2: mat2's GdkPixbuf re-encode adds a fresh JFIF APP0. This is not a privacy gap (no original metadata survives), but it is a fingerprint that the file was processed by GdkPixbuf. JpegStrategy leaves no such marker: the output is identical in structure to a JPEG that was never stamped with any metadata in the first place.

Caveats and limits of this test

  • The fixture exercises 10 metadata sources but does not exercise: APP2 (ICC profiles — would require a synthesized profile file; tested separately by the unit-test suite via preserveColorProfile), APP14 (Adobe DCT — kept by JpegStrategy by design; mat2's GdkPixbuf re-encode drops it, which is a meaningful policy difference not captured by this sentinel battery since APP14 carries no string data), APP3..APP12 / APP15 (rare in real-world output; the walker drops them by the same code path).
  • MakerNotes are not tested via this sentinel battery — exiftool refused to inject a custom MakerNote on the Pillow fixture without a recognised camera Make. The walker drops MakerNote-bearing APP1 EXIF segments entirely; the unit tests exercise this path separately.
  • mat2's GdkPixbuf backend re-encodes the image at its own quality setting, potentially degrading the image. The sentinel battery only tests metadata recoverability, not image fidelity. For a privacy tool, the recoverability result is the relevant measure.
  • mat2 has no orientation-preservation mode, so the preserveOrientation: true comparison is JpegStrategy-exclusive.
  • The runner skips mat2 cleanly if it is not installed. All existing recovery channels (JpegStrategy, exiftool) still run. The mat2 column shows (skipped) in the comparison table.
  • This test is reproducible but not in CI yet. A natural follow-up is wiring tools/forensic/jpeg.ts (and the other forensic runners) into a release-gate test.

Reproducing

# Prerequisites
sudo apt install mat2        # optional — adds the comparison column
# exiftool and python3 (with Pillow) are required

# From the project root
npx tsx tools/forensic/jpeg.ts

Outputs go to /tmp/jpeg-forensic/:

  • input.jpg — the rich fixture (Pillow baseline + all sentinels injected)
  • our-default.jpgJpegStrategy output, default options
  • our-preserve-orientation.jpgJpegStrategy output, preserveOrientation: true
  • exiftool-stripped.jpgexiftool -all= output
  • mat2-stripped.jpg — mat2 output (if mat2 installed)
  • report.json — structured per-output sentinel-survival data

Required tools: exiftool, python3 (with Pillow), strings. All available on Debian/Ubuntu via apt (libimage-exiftool-perl, python3-pil, binutils). mat2 optional (sudo apt install mat2).