Parsing floats fast

Hello everyone,

I've got a few points to discuss about parsing numbers in the wild.

I have a ~200MB JSON file with a lot of lat/lon coordinates in it, about 8 million floating point numbers in total.

When using the standard way of parsing strings (well, &str's) into floats โ€” str.parse()โ€” my profiler (linux perf) tells me that the program spends roughly 47% in the guts of number parsing in "num" (filtered and de-mungled for sanity):

    17.68% num::dec2flt::num::digits_to_big
    13.31% num::dec2flt::rawfp::big_to_fp
     6.76% num::dec2flt::parse::parse_decimal
     2.96% num::diy_float::normalize
     2.69% num::dec2flt::rawfp::fp_to_float::fp_to_float
     1.49% num::bignum::bit_length
     1.23% num::dec2flt::rawfp::encode_normal::encode_normal
     0.94% num::diy_float::mul

The whole file is parsed in ~3 secs.

After seeing this, I tried to implement my own parsing function as a proof of concept. It turned out to be much faster, taking only 18% of the whole load and reducing the overall time to ~ 1.3 secs.

Now, on to my questions:

  • Why Rust's parsing of floating point numbers from strings is slow? Is there some design compromise that I'm missing or is it just something that never got much attention? I've only looked through libcore/num/dec2flt/parse.rs by now, and there doesn't seem to be anything awfully different from what I came up with.

  • I don't believe my case is niche (after all, this is the most common floating point format out there), so I think that a library author like myself should not need to implement number parsing by hand. What would be the best way to solve this? Fix libcore/num? Make a third-party crate? Make it a part of JSON parsing in serde?

  • Regardless of parsing floats, does anyone feel if it's better for a JSON parsing library to not use IEEE floats at all (due to the their intrinsic problem with precision) and choose some decimal implementation instead? In other words, wouldn't working with decimals seem completely foreign to an unsuspecting library user?

Thanks!

Some time ago I saw that this landed in 1.4:

https://github.com/rust-lang/rust/pull/27307

It has to be correct. The naive integer implementation isn't guaranteed to round the decimal input to the closest float, and in fact doesn't:

extern crate ieee754; // [dependencies] ieee754 = "0.2"
use ieee754::Ieee754;

fn parse_naive(s: &str) -> f64 {
    let mut x = 0;
    let mut seen_dot = false;
    let mut pow = 0;
    for b in s.bytes() {
        match b {
            b'.' => {
                seen_dot = true;
            }
            b'0'...b'9' => {
                if seen_dot {
                    pow -= 1;
                }
                x = 10 * x + (b - b'0') as i64;
            }
            _ => panic!(),
        }
    }
    x as f64 * 10_f64.powi(pow)
}

fn main() {
    let mut count = 0;
    let mut errs = vec![];
    for x in 0.1_f32.upto(10_f32) { // f32 so this loop finishes
        let x = x as f64;
        count += 1;
        let s = format!("{}", x);

        assert_eq!(s.parse::<f64>().unwrap(), x);
        let parsed = parse_naive(&s);
        if parsed != x {
            let ulp_err = (x - parsed).abs() / x.ulp().unwrap();
            let bits_wrong = ulp_err.round() as u32;
            errs.push(bits_wrong)
        }
    }

    println!("{}/{} incorrect, max bits wrong {}",
             errs.len(), count,
             errs.iter().cloned().max().unwrap_or(0))
}

This prints:

18463635/55784244 incorrect, max bits wrong 2

That is, the integer parsing technique cannot correctly parse what Rust prints by default (which is an unambiguous representation of the float) for ~33% of the 56 million f32 values between 0.1 and 10 (inclusive). Of course, there's infinitely many strings that should parse to any particular float, so others strings that aren't tested here may be even more wrong, and I'm only testing a very small subset of the possible f64 values.

The maximum error for these cases is just the lowest two bits being wrong (which is a maximum relative error of approximately 5e-16), and I'm sure this is small enough for most cases, however it isn't good enough for a default: not being able to round-trip is unfortunate, and theorems about floats and algorithms using them rely on things being correctly rounded, or at least, having well-defined/bounded errors (I have no idea if this technique has small guaranteed bounds).

5 Likes

The fast path is implemented here: https://github.com/rust-lang/rust/blob/master/src/libcore/num/dec2flt/algorithm.rs#L48 . Basically equivalent to your int as f64 * (10.0f64).powi(pow as i32);, with some small refinements. The problem, as you've discovered, is that it's really easy to fall off the fast path: it only works for numbers with a small number of digits and a tiny exponent to avoid rounding errors. (digits_to_big etc. are only used on the slow, soft-float path).

That said, there are probably some potential refinements to the current code... for example, rust/num.rs at 219eca11b044de3644b3e9101124513c1a842b09 ยท rust-lang/rust ยท GitHub is very slow.

Thanks for the pointers, everyone! Correctness seems to be the sensible choice, that's what I was missing from the picture. My naive implementation is also hopelessly incorrect, but I had the impression that fixing it wouldn't affect performance too much. I was wrong apparently :slight_smile:

I also did more test with my data regenerated to ensure all the numbers having no more than 7 fractional digits to fit into the Rust's happy path. This improved things considerably:

  • Rust slow path: 3 secs
  • Rust fast path: 1.7 secs
  • Naive incorrect parsing: 1.3 secs

โ€ฆ which, I think, is acceptable.

The problem here, I guess, is that while having 8 million long fractional numbers to parse can be considered a corner case, it happens to be the real-world one. I wonder if we could implement a fast parsing mode that would deliberately cut precision to a given limit? If it sounds like a good idea conceptually, I'd be happy to attempt an implementation.

fn fast_float(input: &str)  -> f64 {
    let point = input.find('.').unwrap_or(input.len());
    let cutoff = min(point + 5, input.len());
    &input[0 .. cutoff].parse()
}

If you have that much data, it is worthwhile to consider storing it in a binary format instead.

I was wondering about the standard library. Floats are tricky enough to get wrong when writing such ad-hoc wrappers at the application level. For example, yours would work fine for 5 decimal digits but for 16 it would work only sometimes.

My idea was that along with the fallback from the fast-correct path to the slow-correct one we might also want a choice of falling back to a fast-incorrect path. Where "incorrect" means cutting the already parsed value to a more limited precision instead of recovering the true value of the last digit.

It's not my data, it's just some public data on the web that happens to be in JSON.