Porting mozjpeg to Rust ($10k gig)

The specifications are available in parts (see https://stackoverflow.com/questions/8560571/where-can-i-get-free-specifications-of-jpeg-jfif-exif-etc)

For defining “good enough”, we have a few dimenions:

  1. Does it decode all files that mozjpeg can decode? If not, it’s a bug.
  2. Does it report decode errors descriptively? If not, it’s a bug.
  3. Does it encode files with the same resultant quality (DSSIM) as mozjpeg? If not, there’s a math bug somewhere.
  4. Does it decode and encode quickly, within 20% of the speed of mozjpeg? If not, it’s a bug.

As far as API surface and functionality exposed, I think that will be driven by:

  1. Speed. We want zero-cost abstraction in both syncronous and async usage contexts. Decoding from buffered or contiguous memory, memory-mapped file, file descriptor, or IO descriptor (struct with read/seek/pos/write fnptrs) would be important. Also, mozjpeg is currently single-threaded. With a Rust codebase it would be much easier to reason about parallel encoding or decoding.
  2. File size. We would want to match mozjpeg’s encode features.
  3. Backwards compatibility - It would be quite nice to have backwards compatibility with the libjpeg API, even if as a C shim. This is optional, but would expand the potential user base significantly.

I think that a function-by-function review makes sense, but that significant refactoring would be required - too much for a function-by-function translation to work as-is.

We currently have two donors putting up $18k USD, but perhaps we could find more. This is a big project.

1 Like
  1. Let X be a file that is invalid according to the JPEG standard, but mozjpeg manages to decode. Do you require the Rust code to decode X? In the same way mozjpeg decodes it? If so, this means the contractor can’t code against the JPEG standard and has to code against mozjpeg – which would likely imply a function by function translation.

  2. (3 & 4): Do you require this for every file in existence? (If so, this seems impossible to satisfy as one can construct adversarial examples). If not, and it’s only required for “99%” of the files, it seems the contractor would need a set of files to test on.

4 Likes

Hi! I would like to suggest Apriorit for the contractual work, as I have had good experience with them porting the GFWX codec from C++ to Rust, resulting in gfwx-rs. This being said, even if GFWX is not a “simple” codec, a proper JPEG codec implementation like the ones from mozjpeg or libjpeg-turbo is a significant amount of work with years of performance tweaking and stabilization behind it. The core issue here really is SIMD optimization, because that takes a lot of time and money to implement properly for all processor architectures.

I have personally implemented quite a few codecs in the past through the FreeRDP project, and my experience with JPEG has mostly been through my works of reverse engineering the Apple Remote Desktop (ARD) protocol. ARD has a proprietary progressive codec based on JPEG that makes use of the same discrete cosine transform as JPEG, so my experience is mostly with the DCT part.

Anyway, there is one (boring) question that must be asked: how much work would it take to make a version of mozjpeg without setjmp/longjmp, even if it would mean breaking API compatibility? If it’s actually within the range of money currently offered for the gig, it might be worth considering, as it would be a safe bet.

Another suggestion I could make regarding the SIMD optimization is that there exists a way to make something extremely performant for much less (manual) development work: Halide (https://halide-lang.org). The major pain point of Halide is just the compile-time dependency, as you can have it generate AOT code that can be linked in a library. I have successfully used it to generate optimized image processing functions in a utility library of mine, alongside manually-written SIMD functions: https://github.com/devolutions/xpp

I can tell you Halide is probably the best bang for your buck in terms of results for the development effort and cost. It can generate optimized functions for all processor extensions and even for GPUs. This might have a change of getting something close to the existing JPEG libraries in terms of performance.

4 Likes

For defining “good enough”, we have a few dimenions:

  1. Does it decode all files that mozjpeg can decode? If not, it’s a bug.
  2. Does it report decode errors descriptively? If not, it’s a bug.
  3. Does it encode files with the same resultant quality (DSSIM) as mozjpeg? If not, there’s a math bug somewhere.
  4. Does it decode and encode quickly, within 20% of the speed of mozjpeg? If not, it’s a bug.

Does this not fall under Hyrum’s Law that was introduced earlier in this thread?

1 Like

To extend on my reply from yesterday:

For DSSIM comparison, https://crates.io/crates/dssim is a really neat tool that can also be used as a library for unit tests. Because of its license, it should only be included in unit tests or used as an external program though.

Regarding halide-lang, let me clarify where most of the complexity lies with it: the way to use Halide is to compile a “generator” that makes use of the Halide API to describe the graphical operations. This compiled generator program can then produce object code or a library for the target of your choice, with CPU options or even GPU languages. This part (the AOT code) is actually quite small, and can be wrapped under a C API, even if it’s using C++ under the hood for its (small) runtime. Unlike manually-written code, Halide is able to re-generate code with a minimal effort, making it possible to explore a much wider possibility of combinations for ways to write an optimized SIMD function.

My suggestion would be to define a first milestone of this project where you have most of what you want in pure Rust, with minimal optimization work done at this point. The jpeg-decoder library mentioned earlier looks like a good starting point. Just get it working the way you want it at a functional level, run a profiler on the code and identify the first major hot spots that would become part of the second milestone where optimization work is done, using whatever can produce SIMD code, whatever the optimization solution is (manual SIMD code, rayon, crossbeam, halide, etc). With SIMD, a list of CPU architectures would need to be defined: which versions of intel and ARM processors do you want, which versions of SSE, and don’t forget aarch64, etc).

I should mention that we have contracted the primary libjpeg-turbo developer to write SIMD functions that are now in libxpp for our own needs, unrelated to JPEG specifically. Those optimized SIMD functions are now in libxpp alongside the Halide ones. There were a few cases where SIMD was better, but Halide actually was faster in a few cases. The thing is that even if Darrell is one of the best SIMD developers I’ve seen (he clearly knew his stuff, down to the behaviour of specific ARM CPUs for optimization), we’ve still managed to get quite good results on our side using Halide. In a way, Halide almost feels like cheating.

Halide is currently exposing a C++ generator API, but I’ve been told this API could be used in other languages, avoiding the step of building a C++ generator that can then generate usable object code. There’s a python wrapper, but someone could work on a Rust wrapper. It’s using llvm under the hood, so cross-compilation is built-in already. Even then, the current C++ generator API could easily be wrapped under a build.rs script that would compile and run the generator, adding the .a library to the link libraries. The compile-time requirements would be similar to what you need to run rust-bindgen on the fly.

1 Like

Alternatives to a complete rewrite:

It should be possible to refactor mozjpeg/libjpeg-turbo to use non-unwinding error handling. i.e. everywhere in the code where it does ERREXIT() change it to something like return -1 and ensure these return values are handled/forwarded correctly all the way up. The maintainer of libjpeg-turbo doesn’t want such a widespread change, so it’d be a fork.

A less intrusive variant of this is to use C++ exceptions for unwinding and translate catch to error codes only on the highest level of the API.

Another option is to ignore the classic libjpeg API and use libjpeg-turbo API (which is a high-level wrapper on top of libjpeg API, uses setjmp/longjmp only internally, exposes error codes to the world). The downside is that it’s a simple high-level API, so it probably won’t satisfy advanced use-cases.

6 Likes

If you’re going to rewrite mozjpeg:

  • I do not recommend using libjpeg API/ABI internally in a new library. It expects applications to mess with the cinfo struct directly, so it’s hard for both the applications and the library to keep the state valid and consistent. If you’d like to have compatibility with the API/ABI, I’d recommend adding a “fake” cinfo struct on top of an internal, cleaner API.

  • libjpeg-turbo’s SIMD functions (such as DCT/IDCT) are mostly in separate assembly files, so very easy to extract and reuse. They could be added to any Rust library to get a speed boost without pulling in the rest of libjpeg.

  • libjpeg-turbo’s optimized Huffman C implementation may be worth keeping.

  • libjpeg-turbo is now officially the reference JPEG implementation, so it’s better to follow it instead of trying to read the JPEG spec. For example, chroma upsampling hasn’t been defined in the spec, but libjpeg-turbo uses a triangle filter, so that’s the method to follow.

  • libjpeg’s design causes unnecessary quality loss by converting DCT to 8-bit YUV, and then again to 8-bit RGB. That causes rounding twice and needs to buffer entire rows. Implementing pipeline as per-block IDCT -> clamped float YUV -> optional color profile conversion -> 8-bit RGB would avoid any rounding beyond the final RGB values, so it’d achieve higher quality and could easier support 10/16-bit outputs (that’s a deviation from reference implementation worth making).

  • libjpeg’s support for progressive rendering is wasteful — each re-render does decoding from scratch. A better implementation (with a better API) could keep more partially-decoded state, and do lower-res IDCT and better smoothing of partial outputs.

  • For encoding, mozjpeg’s trellis quantization is pretty valuable and it’s implemented and tuned well from quality/compression perspective. However, the implementation takes a shortcut and compresses the whole file twice just to get statistics for the huffman encoder. A dedicated implementation could be faster (but the main loop of trellis is OK)

  • “jpegcrush” technique for compressing progressive images is very useful from quality/filesize ratio, but the implementation in mozjpeg is a fragile spaghetti code. It replaced a perl script, so that’s a big improvement of what was before, but I wouldn’t recommend porting that literally.

11 Likes

My company (Immunant) has been working on open-source tooling to make porting C code to Rust faster and easier (C2Rust). Our approach to porting is to first translate the C code into unsafe but equivalent Rust then iteratively refactor and rewrite that on the Rust side. This sounds like it might be a good fit for this project since you want to preserve a legacy-compatible ABI along with feature and performance parity. We could largely leave the performance-critical sections as-is and reuse the existing SIMD asm code, while refactoring the internal structure and APIs into idiomatic Rust.

Since C2Rust doesn’t support setjmp/longjmp, we’d either need to do what kornel suggested and rewrite the error-handling before transpiling or substitute a placeholder before translating and rewrite the Rust code to use Result or some other form of error-handling.

Even at cost, I doubt we could port all of libjpeg-turbo for $18k, but we’d be willing to look at supporting the difference as a contribution to the community. It sounds like there’s sufficient interest, so if no one else is taking this project on we can start exploring the effort required and evaluate if we have the time to take it on. @lilith, @notriddle what kind of timeframe are you on?

6 Likes