I’ve got a few points to discuss about parsing numbers in the wild.
I have a ~200MB JSON file with a lot of lat/lon coordinates in it, about 8 million floating point numbers in total.
When using the standard way of parsing strings (well, &str’s) into floats —
str.parse()— my profiler (linux perf) tells me that the program spends roughly 47% in the guts of number parsing in “num” (filtered and de-mungled for sanity):
17.68% num::dec2flt::num::digits_to_big 13.31% num::dec2flt::rawfp::big_to_fp 6.76% num::dec2flt::parse::parse_decimal 2.96% num::diy_float::normalize 2.69% num::dec2flt::rawfp::fp_to_float::fp_to_float 1.49% num::bignum::bit_length 1.23% num::dec2flt::rawfp::encode_normal::encode_normal 0.94% num::diy_float::mul
The whole file is parsed in ~3 secs.
After seeing this, I tried to implement my own parsing function as a proof of concept. It turned out to be much faster, taking only 18% of the whole load and reducing the overall time to ~ 1.3 secs.
Now, on to my questions:
Why Rust’s parsing of floating point numbers from strings is slow? Is there some design compromise that I’m missing or is it just something that never got much attention? I’ve only looked through libcore/num/dec2flt/parse.rs by now, and there doesn’t seem to be anything awfully different from what I came up with.
I don’t believe my case is niche (after all, this is the most common floating point format out there), so I think that a library author like myself should not need to implement number parsing by hand. What would be the best way to solve this? Fix libcore/num? Make a third-party crate? Make it a part of JSON parsing in serde?
Regardless of parsing floats, does anyone feel if it’s better for a JSON parsing library to not use IEEE floats at all (due to the their intrinsic problem with precision) and choose some decimal implementation instead? In other words, wouldn’t working with decimals seem completely foreign to an unsuspecting library user?