Read and write custom metadata in JPEG

I want to be able to read and write some metadata consisting of

  • 2 Strings rarely exceeding 20 characters each
  • 4 integers

to/from a JPEG, in pure Rust.

Can you recommend any crates that might help with this, or offer any other relevant advice?

This looks like a good crate for your needs: crates.io: Rust Package Registry

1 Like

I've been banging my head against a brick wall for a while.

I can

  1. read a JPEG into memory
  2. retrieve its segments
  3. add a new segment to it, in-memory
  4. retrieve the segments, including the new one, from the modified in-memory JPEG
  5. write the modified JPEG back to its original location
  6. observe that its size grew by the size of my new segment, by
    • seeing how many bytes are written
    • observing that the size of the on-disk file has increased

but when I read it back in from disk, the new segment does not appear.

Here is my test program
use std::env::args;
use img_parts::jpeg::{self, JpegSegment};

const OUR_MARKER: u8 = jpeg::markers::COM;

fn main() {
    let mut args = args();
    let _executable = args.next();
    let path = args.next().unwrap();
    println!("Opening {path}");
    let input = std::fs::read(&path).unwrap();
    let mut jpeg = jpeg::Jpeg::from_bytes(input.into()).unwrap();

    report_segments(&jpeg, OUR_MARKER, "Segments when loading");

    let segments = jpeg.segments_mut();
    segments.push(JpegSegment::new_with_contents(
        OUR_MARKER,
        img_parts::Bytes::from("**** WE WROTE THIS: THIS IS OURS *****"))
    );
    report_segments(&jpeg, OUR_MARKER, "Segments before writing");

    let output = std::fs::File::create(&path).unwrap();
    let bytes_written = jpeg.clone().encoder().write_to(output).unwrap();
    println!("Wrote {bytes_written} bytes to {path}");

    let it = jpeg.segment_by_marker(OUR_MARKER);
    println!("Our new segment retrieved: {it:?} {contents:?}", contents = std::str::from_utf8(it.unwrap().contents()));

    println!("Re-reading the jpeg we just wrote");
    let new_input = std::fs::read(&path).unwrap();
    let new_jpeg = jpeg::Jpeg::from_bytes(new_input.into()).unwrap();
    report_segments(&new_jpeg, OUR_MARKER, "Segments after re-reading");
}

fn report_segments(jpeg: &jpeg::Jpeg, our_marker: u8, msg: &str) {
    println!("=============== {msg} ===============");
    for (n, segment) in jpeg.segments().iter().enumerate() {
        let marker = segment.marker();
        println!("Marker {n:2} in input: {marker}");
    }
    println!("---- Contents of OUR segments ----");
    for segment in jpeg.segments_by_marker(our_marker) {
        let contents = std::str::from_utf8(segment.contents()).unwrap();
        println!("   {contents}");
    }
    println!("---- End of segment report ----");
}
Here is some sample output
cargo run --release --bin wtf -- /tmp/test.jpg
Finished `release` profile [optimized] target(s) in 0.18s
     Running `target/release/wtf /tmp/test.jpg`
Opening /tmp/test.jpg
=============== Segments when loading ===============
Marker  0 in input: 224
Marker  1 in input: 219
Marker  2 in input: 219
Marker  3 in input: 192
Marker  4 in input: 196
Marker  5 in input: 196
Marker  6 in input: 196
Marker  7 in input: 196
Marker  8 in input: 218
---- Contents of OUR segments ----
---- End of segment report ----
=============== Segments before writing ===============
Marker  0 in input: 224
Marker  1 in input: 219
Marker  2 in input: 219
Marker  3 in input: 192
Marker  4 in input: 196
Marker  5 in input: 196
Marker  6 in input: 196
Marker  7 in input: 196
Marker  8 in input: 218
Marker  9 in input: 254
---- Contents of OUR segments ----
   **** WE WROTE THIS: THIS IS OURS *****
---- End of segment report ----
Wrote 5792 bytes to /tmp/test.jpg
Our new segment retrieved: Some(JpegSegment { marker: 254 }) Ok("**** WE WROTE THIS: THIS IS OURS *****")
Re-reading the jpeg we just wrote
=============== Segments after re-reading ===============
Marker  0 in input: 224
Marker  1 in input: 219
Marker  2 in input: 219
Marker  3 in input: 192
Marker  4 in input: 196
Marker  5 in input: 196
Marker  6 in input: 196
Marker  7 in input: 196
Marker  8 in input: 218
---- Contents of OUR segments ----
---- End of segment report ----

Can you spot what I'm doing wrong?

All I can think of is that you are reading stale data. You already have a handle to the file here:

let output = std::fs::File::create(&path).unwrap();

Just re-use it to get the file contents:

-let new_input = std::fs::read(&path).unwrap();
+let mut new_input = Vec::new();
+output.read_to_end(&mut new_input).unwrap();

The on-disk file grows by N bytes, every time I run this program, but the new segments never appear when it is read in.

Even new processes don't find the new segments when reading in a file that occupies more space than it did before the previous execution of the program.

I think this is inconsistent with the stale data hypothesis, unless I'm failing to grasp your point.

Here are the raw contents of the end of the file
...
00001630: 669c a5c3 0268 df6d 7485 8978 69b3 d485  f....h.mt..xi...
00001640: 4a19 8317 2055 dff8 d63f ffd9 fffe 0028  J... U...?.....(
00001650: 2a2a 2a2a 2057 4520 5752 4f54 4520 5448  **** WE WROTE TH
00001660: 4953 3a20 5448 4953 2049 5320 4f55 5253  IS: THIS IS OURS
00001670: 202a 2a2a 2a2a fffe 0028 2a2a 2a2a 2057   *****...(**** W
00001680: 4520 5752 4f54 4520 5448 4953 3a20 5448  E WROTE THIS: TH
00001690: 4953 2049 5320 4f55 5253 202a 2a2a 2a2a  IS IS OURS *****
000016a0: fffe 0028 2a2a 2a2a 2057 4520 5752 4f54  ...(**** WE WROT
000016b0: 4520 5448 4953 3a20 5448 4953 2049 5320  E THIS: THIS IS 
000016c0: 4f55 5253 202a 2a2a 2a2a fffe 0028 2a2a  OURS *****...(**
000016d0: 2a2a 2057 4520 5752 4f54 4520 5448 4953  ** WE WROTE THIS
000016e0: 3a20 5448 4953 2049 5320 4f55 5253 202a  : THIS IS OURS *
000016f0: 2a2a 2a2a fffe 0028 2a2a 2a2a 2057 4520  ****...(**** WE 
00001700: 5752 4f54 4520 5448 4953 3a20 5448 4953  WROTE THIS: THIS
00001710: 2049 5320 4f55 5253 202a 2a2a 2a2a fffe   IS OURS *****..
00001720: 0028 2a2a 2a2a 2057 4520 5752 4f54 4520  .(**** WE WROTE 
00001730: 5448 4953 3a20 5448 4953 2049 5320 4f55  THIS: THIS IS OU
00001740: 5253 202a 2a2a 2a2a ffed 0028 2a2a 2a2a  RS *****...(****
00001750: 2057 4520 5752 4f54 4520 5448 4953 3a20   WE WROTE THIS: 
00001760: 5448 4953 2049 5320 4f55 5253 202a 2a2a  THIS IS OURS ***
00001770: 2a2a ffed 0028 2a2a 2a2a 2057 4520 5752  **...(**** WE WR
00001780: 4f54 4520 5448 4953 3a20 5448 4953 2049  OTE THIS: THIS I
00001790: 5320 4f55 5253 202a 2a2a 2a2a ffed 0028  S OURS *****...(
000017a0: 2a2a 2a2a 2057 4520 5752 4f54 4520 5448  **** WE WROTE TH
000017b0: 4953 3a20 5448 4953 2049 5320 4f55 5253  IS: THIS IS OURS
000017c0: 202a 2a2a 2a2a ffed 0028 2a2a 2a2a 2057   *****...(**** W
000017d0: 4520 5752 4f54 4520 5448 4953 3a20 5448  E WROTE THIS: TH
000017e0: 4953 2049 5320 4f55 5253 202a 2a2a 2a
  • There are multiple copies of the segment, one for each time the program was executed on this file.
  • The segment markers are there: fffe for each time it was run with COM and ffed for each time it was executed using APP13 as the marker.

So, clearly the segment is added to the file each time the program runs, but it is not being discovered when it is read back in.

This is insane: two JPEGs containing identical bytes have different segments, according to the segment and segment_by_marker methods!

Some code that demonstrates this
use std::{env::args, io::Write};
use std::fs::File;
use std::path::Path;
use img_parts::jpeg::{self, JpegSegment, Jpeg};

const OUR_MARKER: u8 = jpeg::markers::APP7;

fn main() {
    // Get path of JPEG from CLI
    let mut args = args();
    let _executable = args.next();
    let path = args.next().unwrap();

    // Read a JPEG and report its segments
    let mut jpeg_original = read_jpeg(&path);
    report_segments(&jpeg_original, "Segments when first loaded");

    // Add our segment to a clone of the JPEG, and report the segments: new segment is found
    let jpeg_before_new_segment = jpeg_original.clone();
    let segments = jpeg_original.segments_mut();
    segments.push(make_our_segment());
    let jpeg_with_new_segment = jpeg_original; // this is a move
    report_segments(&jpeg_with_new_segment, "Segments after pushing new segment");

    // The bytes method gives something different before/after the segment has been added
    assert!(! compare_bytes_in_jpeg(jpeg_before_new_segment, jpeg_with_new_segment.clone()));

    // Recrate the JPEG by roundtrip via bytes
    let jpeg_via_bytes = bytes_to_jpeg(&jpeg_to_bytes(jpeg_with_new_segment.clone()));
    // ... the bytes in the new JPEG are identical to the ones in the old one ...
    assert!(compare_bytes_in_jpeg(jpeg_with_new_segment.clone(), jpeg_via_bytes.clone()));
    // ... but the new segment is not found in the copy-via-roundtrip
    report_segments(&jpeg_via_bytes, "Segments after pushing new segment and in-memory roundtrip");

    // Recrate the JPEG by roundtrip via file
    write_jpeg(jpeg_with_new_segment.clone(), &mut File::create(&path).unwrap());
    let jpeg_via_file = read_jpeg(&path);
    // The bytes are identical again ...
    assert!(compare_bytes_in_jpeg(jpeg_with_new_segment.clone(), jpeg_via_file.clone()));
    // ... but the new segment is missing, again.
    report_segments(&jpeg_via_file, "Segments after roundtrip via file");

    // Sanity check: is the new segment still present in the original file
    report_segments(&jpeg_with_new_segment, "Sanity check: segments in the only place where the new segment was found");
}

fn report_segments(jpeg: &jpeg::Jpeg, msg: &str) {
    println!("\n=============== {msg} ===============");
    println!("Bytes in JPEG = {}", jpeg_to_bytes(jpeg.clone()).len());
    for (n, segment) in jpeg.segments().iter().enumerate() {
        let marker = segment.marker();
        println!("Marker {n:2} in input: {marker:x}");
    }
    println!("---- Looking for  OUR segments ----");
    if let Some(our_segment) = jpeg.segment_by_marker(OUR_MARKER) {
        let contents = std::str::from_utf8(our_segment.contents()).unwrap();
        println!("Our segment was found: `{our_segment:x?} {contents}`.")
    }  else {
        println!("Our segment was NOT found.")
    }

    println!("---- End of segment report ----");
}

fn make_our_segment() -> JpegSegment {
    JpegSegment::new_with_contents(
        OUR_MARKER,
        img_parts::Bytes::from("<THIS IS DRIVING ME NUTS>")
    )
}

fn compare_bytes_in_jpeg(a: Jpeg, b: Jpeg) -> bool {
    let a = jpeg_to_bytes(a);
    let b = jpeg_to_bytes(b);
    a == b
    //println!("The bytes in these two JPEGs are {}", if a == b { "IDENTICAL" } else { "DIFFERENT" });
    // println!("Lengths: {} {}", a.len(), b.len());
    // for (i, (a, b)) in a.into_iter().zip(b).enumerate() {
    //     print!("{a:02x} {b:02x} {} ", if a==b {""} else {"XXXX"});
    //     if i%16 == 15 { println!(); }
    // }
    // println!();
}

fn bytes_to_jpeg(bytes: &[u8]) -> Jpeg {
    Jpeg::from_bytes(bytes.to_owned().into()).unwrap()
}

fn jpeg_to_bytes(jpeg: Jpeg) -> Vec<u8> {
    let mut bytes = vec![];
    write_jpeg(jpeg, &mut bytes);
    bytes
}

fn write_jpeg(jpeg: Jpeg, sink: &mut impl Write) {
    jpeg.encoder().write_to(sink).unwrap();
}

fn read_jpeg(path: impl AsRef<Path>) -> Jpeg {
    bytes_to_jpeg(&std::fs::read(&path).unwrap())
}

That's interesting. I'm as clueless as you are here. Personally, I would ask the crate's author, they are more likely to have insights on what's going on here.

Looking at the implementation

it seems that the from_bytes function stops looking for any further segments once it has found the EOI (End of Image) marker, and all the segments I'm adding are placed after EOI.

Looks like a bug. I'll open an issue.

1 Like

That's probably why on the examples they were using insert rather than push to add new segments.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.