Gzip compression with flate2

Hey there,

I am using Rust to compress a .json to .gz. I am new to Rust and testing it for performance reasons.
Together with ChatGPT I came up with following. Which works, but is not so efficient as expected (Node.js is actually faster):

use aws_config::load_from_env;
use aws_sdk_s3::Client;
use serde_json::{self, Value};
use tokio::io::AsyncReadExt;
use std::fs::File;
use std::io::{BufWriter, Write};
use flate2::write::GzEncoder;
use flate2::Compression;

...

let response: aws_sdk_s3::operation::get_object::GetObjectOutput = client.get_object()
            .bucket(bucket_name)
            .key(key)
            .send()
            .await?;
    
let stream = response.body;
let file = File::create(output_path)?;
let writer = BufWriter::new(file);

let mut encoder = GzEncoder::new(writer, Compression::default());       
let mut body = stream.into_async_read();
let mut buffer = [0; 8192];

loop {
  
    let len = match body.read(&mut buffer).await {
        Ok(0) => break, 
        Ok(len) => len,
        Err(e) => return Err(Error::from(e)),
    };
    encoder.write_all(&buffer[0..len])?;
}
encoder.finish()?;

The .json is read from a S3 Bucket, but the important part is the compression itself. I think, that I made it very inefficient. Maybe you got a better solution for this? How can I make it more efficient?

Thank you very much and sorry for my rookie code :sweat_smile:

How are you measuring its performance?

In time - for how long the program runs

Can you be more explicit? Describe how are you running it, i.e. by pressing the keys on your keyboard.

Mandatory question: did you build your code in --release mode?

In production, the code is run automatically.
In development, I just do cargo run --release to test it. I've put print statements in between to measure the time and the loop takes too many seconds. So I wanted to ask, if there is anything I can optimize :confused:

Yes,
the Code is built via cargo lambda, which builds in --release mode.

Can you post the Node.js version as well? Maybe the two implementations differ at something that might explain the observed difference in performance.

This is the code, Node uses. It uses the zlib library:

  const body = (await s3Client.send(command)).Body;
  if (body instanceof Readable) {
    const gzip = zlib.createGzip();
    const outputPath = join("/tmp", "compressed.gz");
    const outputStream = createWriteStream(outputPath);

    await new Promise<void>((resolve, reject) => {
      body
        .pipe(gzip)
        .pipe(outputStream)
        .on("finish", () => {
          resolve();
        })
        .on("error", reject);
    });

It is very simple. But since I am new to rust, I was thinking that the rust code I came up with is not that right. How do you think about the provided rust code?

The first thing that pops is that on the Rust version you are doing two things: Read the body into a buffer, and then read that buffer while you write to the output file, while on the Node.js version you are doing that in a single step by piping the body's data into the write stream.

Yes, thank you - that is true.

The Node implementation is the current production one. But I want to use Rust to make it more efficient. It is kind of a "showcase"-function for getting more into rust because of its performance benefits. So it was kind of depressing to see, that the Node implementation is better (for now).
Is there anything I could do, to make it perform better?

The Node implementation is just calling into the zlib module to do the compression work, which is just a C library, so how fast the compression is has very little to do with Node or JavaScript. You shouldn't really expect using Rust to be any faster here - flate2 uses either a native Rust implementation or the exact same zlib C library depending on which features you select, and the docs for the crate mention that there isn't a significant performance difference between the two.

The rest of the work here is just IO, which as long as it's done sensibly is unlikely to have significantly different performance in different languages.

So, this just isn't a good example to show that Rust can be faster - you would be more likely to show a benefit if you picked an example that has complex logic that's actually written in JavaScript. But.. even then, V8 is a very good JIT compiler and you might find it difficult to beat without a lot of effort.

3 Likes

Wow! Thank you, this makes total sense.
I haven't thought about the fact, that the zlib library is actually written in C. But with that in mind, the results make total sense.
Thank you so much for clarifying this. I will try to find other use cases!

When comparing the general performance of an ahead-of-time compiled/optimized language like Rust to a VM with an optimizing JIT like V8, it's often the case that the easiest differences to see are:

  1. Startup time: use cases where a program is launched, does something that doesn't take very long, and then exits. VMs often have considerable overhead just to start themselves up, and if the actual code you're running doesn't execute for very long the JIT compiler won't have an opportunity to do anything clever with it. But, for lots of use cases this doesn't matter very much - if your program typically runs for more than a few minutes then the startup cost and initial slower performance may just be lost in the noise and not matter much overall.

  2. Memory usage: the JIT-compiled version of the code takes up memory, and the VM's own state and data structures do too; ahead-of-time compiled languages avoid a lot of this (or just use less memory for it). But, again, it doesn't always matter very much: if your program uses lots of memory for its actual data then the overhead of the VM might not be worth worrying about.

There certainly can be significant differences in actual runtime "compute" performance in some cases, even if the code in both languages is well-written and the VM being used is good, but it's often hard to guess which cases this will apply to; optimizing JIT compilers can be pretty surprising and unintuitive about what kinds of code they do a good or bad job with.

Lambda to compile? Well I never!

A easy example if you actually want to make Node look bad is writing a very simple parser for, eg INI (split by line then by comma: easy, right?), and using that to aggregate numbers by in some large (eg 100MB-1GB) file.

Take all these values with a grain of salt, it's from my memory from several months ago, but:

With natural looking code, I got around 20s/GB with around 40 lines of Node (which has to manually deal with buffering to be able to split a stream by line), vs 2s/GB with the trivial Rust 10 lines that just needs:

for line in buf_reader.lines() {
  for (index, value) in line.split(',').enumerate() {
    let value = value.parse()?;

Which (asymptomatically) never allocates.

You can then forstall the "but you can write the JavaScript more efficiently" tack by showing what a JavaScript implementation that avoids any allocation looks like. I managed to get the Node version down to around 3s/GB with a couple hours of work and 200 lines (that have to reimplement int parsing on a buffer and start, end index to avoid allocating an argument, and abuse knowledge of the input data to avoid using a Map)

Because of this, I expect you could show the difference between using a library to do the same and get more similar results to the first version or even worse!

Note that this is not a particularly realistic or useful example: at the extreme, if you are writing Lambda services that inherently handle a single request at a time per process and spend most of their time waiting for other services to return, then Rust isn't going to be able to do much for you (at least performance wise); your best bet there is to show performance at the bottom end of the instance sizing to make it a financial pitch, given how drastically AWS scale down CPU performance at the bottom end 128MB size. (But then again, AFAICT establishing SSL is a large part of the time there, so maybe Rust is just as bad?)

In the end, taking an existing piece of production code that shows bad performance and improving it is the most effective way to show results, because they're immediately real, but that isn't always plausible.


I'll note that I haven't made the Rust pitch internally in our company, really; I've used it in a couple of separate projects where it makes sense personally, and reviewers haven't revolted, but we already have a pretty lose culture about this, so I don't generally recommend that you everyone! Remember that Rust, as much as like using it, isn't always the best option for every problem, especially when taking training up developers on it into account, (though that's much less bad than people expect when there's an existing user to ask things of available)

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.