I'm starting to reimplement some genomics analysis code from python to Rust (very new to Rust) and I'm getting a strange result where Regex in rust appears to be very slow-- roughly 8X slower than Python.
Python online check: usually takes 0.25 to 0.4 milliseconds
Rust online check: usually takes 1.3 milliseconds
Could anyone experienced take a quick look and inform me all the (probably many) things I'm doing wrong?
Yeah, @cuviper , I realized that almost immediately after submitting, much to my chagrin. Still, I'm a bit surprised it's even 200 µs for a few regexes.
Am I overusing String when I should use &str more?
200 µs is quite good! I'm surprised it's even that fast. Compiling regexes is quite slow and you're compiling several (albeit fairly simple ones). Still, the overhead of regexes adds up.
It would almost certainly help here. parse_genome is called 100 times, and parse_genome also calls parse_seq. Each time those functions are called, it compiles several regexes.
Ah I see. Well I guess the repetition is done to give a more accurate value for the runtime of a single invocation. So I guess it depends if you want to time the regex compilation too.
Nontheless, OnceCell/Lazy is probably a good idea because the function will likely be called more than once.
Note that you are doing a whole lot of other unnecessary things. You are converting str slices to strings just to pass them to functions that don't need ownership of them; you are implicitly cloning other Strings by calling .to_string() on them, etc. You are also using regexes completely unnecessarily for simple replacement of different kinds of newlines with \n.
Thanks, everyone, especially @H2CO3 for the full rewrite; I'm learning a great deal.
The full pyO3 module test case (rust_quma) is currently 20X speed-up over the python / C implementation, and I know I have a great deal more optimization yet to go.