Flate3 code review

I just released a new version of flate3, it's an alternative to the flate2 crate, claiming better and faster compression than flate2. It uses multiple threads to compress faster. The latest version uses u64::to_le_bytes to append words to the output byte vector, a trick I didn't know about a year ago, and the lastest version of crossbeam.

For example, I just ran a test compressing a font file (FreeSans.ttf) 20 times, with this output ( release mode ):

flate2 compressed size=148336
flate2 test completed ok, n=20 time elapsed=240 milli sec.
flate3 compressed size=146969
flate3 test completed ok, n=20 time elapsed=164 milli sec.

Anyway, I'd welcome comments on the code ( help me get it to run even faster!). I originally wrote it about a year ago, so I guess there are some things I would do differently now compared to then.

5 Likes

One thing I noticed is that there's quite a few for 0..n loops that could be replaced with functional code. For instance, this:

    for sym in 0..self.symbols {
        bl_count[bits[sym] as usize] += 1;
    }

could be written like this:

    for b in &bits[0..self.symbols] {
        bl_count[*b as usize] += 1;
    }

And there's some places where the loop could be eliminated entirely, and replaced with iterators. Iterator code is (arguably) cleaner, but (definitely) allows the compiler to eliminate a range check on each pass of the loop. (E.g., there's no need for a range check on bits on every pass of the loop in this example, although there's still one an unavoidable one for bl_count).

I have no idea whether this would make any measurable difference, but it might be worth a try. In my code, I have occasionally converted for/loop code to iterator code, and seen a noticeable speed boost.

3 Likes

Thanks for the suggestion. My first reaction to your post was to guess that I wasn't very conscious of Rust iterators when I wrote the code, but on closer inspection, it seems I was, and I seem to have made moderately rational choices on when to use them and when not to bother.

I just ran clippy, and it didn't come up with much other than a suggestion to use sort_unstable rather than sort.

One problem I have is I am running on a windows laptop, and it seems almost impossible to measure the effect of small changes to the code as the run time varies by several percent apparently quite randomly with different runs of the test ( even if the number of iterations is large ). I think I used a profiler at one point in the past, but have forgotten what I used ( my memory is not good ).

For now, my tentative conclusion is that it won't be easy to find significant gains, although I suspect someone who really knows what they are doing could find something.

If you have Visual Studio installed, you can use its profiler on Rust programs. Build your program with a modified 'release' profile - you have to make sure symbols are included, which 'release' does not do by default - and then just load the .exe into VS and profile it as you would a C/C++ program.

I have tried to do this, but cannot figure it out. I don't seem to be able to set breakpoints, although "run to cursor" works ok.

Update: I found a setting that fixed the problem, hooray!

Now I simply have the problem that I don't have a "Diagnostic Tools window" as described here. I don't seem to have a "Debug" menu either. I am confused.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.