Success story: new Rustacean beating C perf in first week

So, it's been a while, and my colleague has continued fiddling with the original application in between his other tasks.
It's now basically finished, so you can think of this post as the closing update.

tl;dr: Rust now does more for us, and our implementation got even faster! :smiley:

A big part of the speed increase was discovering the wonderful "needletail" crate (crates.io).
Needletail uses byte-reading, memchr and reused buffers to do (almost) zero-copy iteration. Basically all the things you wonderful people suggested we do, upthread :wink:
In practice, it builds a reader over a file, and that reader then calls your user-supplied closure with each record in the file. Your closure gets handed a lifetime-bounded record backed directly by the read-buffer!.
As a bonus, it transparently reads gzipped files, which was a separate step in our old perl-based system.

As suggested, we switched to Regex::bytes submodule, to avoid all the UTF8 validation when reading our (pure-ascii) fastq files.

In addition to Needletail, we've also started using Rust-bio (crates.io) to handle our nucleotides in the file. They have a nice, fast, reverse complement algorithm, that even works on raw bytes!

To repeat the context, we use this, productively, for our CRISPR-analyzer webservice, that lets you look for CRISPR/CAS9 target sites in your samples, bringing sophisticated biological screens into the reach of labs without dedicated bioinformatics staff.
Speeding up this webservice means more customer-friendlyness (less waiting!), and less resources expended on our side.

timings with time:

-- original PERL version: --
1) gunzip inputfile:                      real 0m 9.674s
2) extract matching fastQ reads:          real 0m46.133s
3) Map to reference genome with bowtie2:  real 0m30.978s  (constant)
4) Map reads to genes:                    real 2m 3.399s
   TOTAL:                                    * 3m30.184s *

++ RUST version ++
1+2) extract fastQ reads from gzip'd input:   real 0m26.334s
  ALT: same, non-gzip'd input:               (real 0m10.971s)
3) Map to reference genome with bowtie2:      real 0m30.978s (constant)
4) Map reads to genes:                        real 0m19.490s
   TOTAL                                           1m16.802s (63% shorter overall!)

Updated code is (still) here on github

For those of you wondering that these timings are longer than the original 3-4 seconds I posted: the 3 seconds time was for an incomplete port of the original, just the bare minimum of iterating and regex-matching (but of course, the same incomplete feature-set for both C and Rust).
These new timings are for the full, does-everything, version. Features cost runtime :slight_smile:

Thanks everyone for helping us save an entire two minutes from our webservice!

31 Likes