Writing jpg into PDF

After a long delay, I finally got around to implementing images for my minimal pdf writing crate.

This is a test example I made today for jpg, it seems a little messy as it is using two different crates to access the jpg, and also, although it works, it isn't 100% clear to me from the Jpeg crate documentation WHY it works... so I am wondering if there is a better or neater way.

 
   // Setup the PDF Writer
   let mut doc = pdf_min::Writer::default();
   doc.b.nocomp = true;

   // Read jpg from file
   let file_bytes = std::fs::read("one.jpg").unwrap();

   // Use jpeg_decoder::Decoder to get jpg info ( color space, bits_per_component, width, height ).
   let mut decoder = jpeg_decoder::Decoder::new(std::io::Cursor::new(&file_bytes));
   decoder.read_info().unwrap();
   let info = decoder.info().unwrap();

   use jpeg_decoder::{PixelFormat};
   
   let color_space: &[u8] = match info.pixel_format { 
       PixelFormat::RGB24 => b"/DeviceRGB",
       PixelFormat::CMYK32 => b"/DeviceCMYK",
       PixelFormat::L8 | PixelFormat::L16 => b"/DeviceGray",
   };

   let bits_per_component = match info.pixel_format { 
       PixelFormat::L16 => 16,
       _ => 8
   };

   // Use img_parts::jpeg::Jpeg to make DCT (Discrete Cosine Transform) compressed data.
   let cdata = 
   {
       let mut cdata = Vec::new();
       let jpeg = img_parts::jpeg::Jpeg::from_bytes(file_bytes.into()).unwrap();
       jpeg.encoder().write_to(&mut cdata).unwrap();
       cdata
   };

   // Make the ImageSpec.
   use pdf_min::{Px, image::{ImageSpec, Image}};
   let ims = ImageSpec {
       data: &cdata,
       width: info.width as Px,
       height: info.height as Px,
       color_space,
       bits_per_component,
       other: b"/Filter/DCT",
   };
   
   // Make the Image from the ImageSpec.
   let im = Image::new(&ims, &mut doc.b);

   // Draw the image on the current page.
   im.draw(&mut doc.p, 20.0, 40.0, 0.20);

   // Save the pdf as a file.
   let bytes = doc.finish();
   let mut file = std::fs::File::create("jpg_image_test.pdf").unwrap();
   use std::io::Write;
   file.write_all(bytes).unwrap();

You are actually decoding and then re-enclding the image data and that's not optimal. Have you tried simply including the JPEG bytes intro the PDF? If that works then you can optimize the process by stripping all unneeded headers.

I did try that, it works, it means the PDF is much larger as the DCT compression is not done, or rather is undone.

I don't know anything about the jpg format or how these crates work (I have not looked the source at all yet), but if the meta info is at the start of the file, so Decoder only looks at the first few bytes, and img_parts doesn't actually de-compress the data then re-compress it, it may be fairly efficient.

The trouble is the documentation doesn't really exist to say what img_parts actually does, for example does it always yield DCT compressed data for a jpg file. Probably I suppose!

That's now what @fogzot was talking about.

You don't need to know. You just need to add JPEG unmodified, without trying to even touch it. Like img2pdf does. Copy bytes, don't even look inside, PDF viewer would do that, later.

It's faster, and you don't destroy quality in your convertor, that way.

I had a look at img-parts source code and it seems that it only manages "segments", i.e. the various chunks of information and data that you find inside a JPEG file. So, you are not really doing any decompression/compression but just copying the JPEG to your PDF. I don't know why the size is different: i just tried the following code:

let args: Vec<String> = env::args().collect();
let input = fs::read(&args[1]).unwrap();
let jpeg = Jpeg::from_bytes(input.into()).unwrap();

let output = File::create(&args[2]).unwrap();
jpeg.encoder().write_to(output).unwrap();

and the output file is identical (as expected) to the input one.

I'd say that what you are doing is correct (even if you use two different libraries). In theory you could improve the code by:

  1. using only img-parts, extract the segments representing image metadata and process them yourself to get image dimensions and pixel format;
  2. use the low-level segments iterator of img-parts to serialize only what you're interested in (for example you could ignore comments, EXIF data and so on) to reduce the file size.

Hope this helps.

Ah, I get it now, thanks all.

I don't need img_parts at all unless I want to strip info out, which for the purposes of this example I don't, I can just write the whole file into the pdf.

I do need jpeg_decoder to get the width, height etc. but I think that is fine, it implies it has the ability to just read the meta data.

Revised code:

   // Setup the PDF Writer
   let mut doc = pdf_min::Writer::default();
   doc.b.nocomp = true;

   // Read jpg from file
   let file_bytes = std::fs::read("one.jpg").unwrap();

   // Use jpeg_decoder::Decoder to get jpg info ( color space, bits_per_component, width, height ).
   let mut decoder = jpeg_decoder::Decoder::new(std::io::Cursor::new(&file_bytes));
   decoder.read_info().unwrap();
   let info = decoder.info().unwrap();

   use jpeg_decoder::{PixelFormat};
   
   let color_space: &[u8] = match info.pixel_format {
       PixelFormat::RGB24 => b"/DeviceRGB",
       PixelFormat::CMYK32 => b"/DeviceCMYK",
       PixelFormat::L8 | PixelFormat::L16 => b"/DeviceGray",
   };

   let bits_per_component = match info.pixel_format {
       PixelFormat::L16 => 16,
       _ => 8
   };

   // Make the ImageSpec.
   use pdf_min::{Px, image::{ImageSpec, Image}};
   let ims = ImageSpec {
       data: &file_bytes,
       width: info.width as Px,
       height: info.height as Px,
       color_space,
       bits_per_component,
       other: b"/Filter/DCT",
   };
   
   // Make the PDF Image from the ImageSpec.
   let im = Image::new(&ims, &mut doc.b);

   // Draw the image on the current page.
   im.draw(&mut doc.p, 20.0, 40.0, 0.20);

   // Save the pdf as a file.
   let bytes = doc.finish();
   let mut file = std::fs::File::create("jpg_image_test.pdf").unwrap();
   use std::io::Write;
   file.write_all(bytes).unwrap();