I am currently writing a data processing program, which handles text files > 10GB.
One thing that is really interesting is I found the program runs incredible slowly even in release build.
I first thought it was the BufReader converts the UTF8 string and made a lot of copy. So I directly use File and use
std::mem::transmute instead of
BufRead::lines. But it turns out a little bit faster (roughly 50% faster and still surprisingly slow).
However, I just realize once I change
atof, the program runs 8 times faster. ( 148s vs 18s)
I am curious is there any reason for Rust’s
from_str so much slower than
It turns out rust use bignum for float parsing.
I have dig into the libcore code a little bit, apparently there’s a pitch hole in the string to float conversion in the libcore.
The code first checks if the string has too many digits that an f64 can’t hold. If the answer is yes, it uses bignum and gives a huge performance punishment. I actually doubt this should be definitely an overkill, because we are able to proof that the exceeded bit won’t affect the result, so I believe the bignum is avoidable.
And it seems all the overflow can be handled by compare to a pre-computed max and min decimal.
Any thoughts ?