Why does floating point multiplication not work?

Why do I get a wrong result when trying to multiply two floating point numbers?

println!("res: {}", 0.28_f64 * 2244098675250432000000_f64);
// res: 628347629070121000000

I expected the result to be "628347629070120960000".

Is this number too large for the f64 data type, if yes, what are the alternatives?

In decimal terms, f64 only guarantees 15 digits of precision. That's what you got.

The better data type depends on what you need. You could get a software f128 implementation, or arbitrary-precision decimal numbers. Or, there might be a solution based on “fixed-point” arithmetic (appropriately scaled integers).

Where do these numbers come from? What do they represent? What are their possible ranges, and how much precision do they have (e.g. will the number that is 0.28 here always have two fractional digits and no more)? How much error is acceptable?

Answer those questions and we can recommend a better solution.

4 Likes

One input value is too large to precisely capture - it's 70 bits wide, and uses enough of those bits to exceed the 52 bit manitssa of an f64. It gets rounded, to 2244098658322444451840 (if I've done the math right, anyways), stored as 1.9008255004882812 * 2.pow(70) (precisely).

The other input value is also not precisely captured, because it isn't representable as a fraction whose denominator is a power of two. It also gets rounded, this time to 1.12 * 2.pow(-2) (also precisely), or 0.280000000000000026645.

So, right out of the gate, you're multiplying a value slightly smaller than your original input by a value slightly larger than your input. The results will not be exactly what the arithmetic product of the two input numbers would be. Whether the resulting error is acceptable or not is an application design question; the floating-point result is within a small fraction of a percent of the arithmetic result, so for engineering purposes it might be fine, but for doing number theory (where precise results down to the integer matter), it would not be.

7 Likes

With a small update to rust 1.86

assert!(628347629070120960000_f64 <= (0.28_f64 * 2244098675250432000000_f64).next_up());
assert!(628347629070120960000_f64 >= (0.28_f64 * 2244098675250432000000_f64).next_down());

No idea if it holds for any single multiplication but working here.

1 Like

On the playground I get:

u128: 2244098675250432000000
 f32: 2244098658322444451840
 f64: 2244098675250432049152

You can also see this on Float Exposed as a C float or double.

3 Likes

Please read "What Every Programmer Should Know About Floating-Point Arithmetic".

5 Likes

The example provides is only an example.
The formula could be expressed like this x/n * n! where the following rules apply:
x >= 0 && x <= n
n >= 1 && n <= 25
In the example above x = 7 and n = 25.
I need to find a solution with no data loss, as even the smallest error will make the result unusable.

As it stands I don't see how you can ever get zero data loss. For example when x = 1.0 and n = 3.0 the x/n requires an infinite number of bits to represent.

I'd start by rearranging things into x * (n - 1)!. Then at least your calculation can be done with integers an give an exact result.

I think i128 is big enough.

3 Likes

You might be looking for a rational numbers library. (Disclaimer: I've never used the linked crate.)

2 Likes

Depending on how big your numbers are, you could use the rust_decimal crate (96-bit fixed point numbers - enough to handle your example).

use rust_decimal::prelude::*;
use rust_decimal_macros::dec;

fn main() {
    let a = dec!(0.28);
    let b = dec!(2244098675250432000000);
    assert_eq!(a * b, dec!(628347629070120960000))
}

Right off the bat, I would simplify that to x * (n - 1)!. The division and subsequent multiplication will eat a lot of precision when n is significantly larger than x.

If x and n are integers, then the result is also an integer. In that case, I would not use floating point arithmetic at all; it's not designed for implementing integer arithmetic precisely at all scales. Given your constraints, the largest possible output is (x = 24) * ((n = 25) - 1)!, or 24 * 24!, which is 14890761641597746544640000 - about an 84-bit number. This, and all of the intermediate products, are representable as a u128.

That gives this program:

fn factorial(n: u128) -> u128 {
    let mut prod = 1;
    for v in (1..=n) {
        prod *= v;
    }
    prod
}

fn foo(x: u8, n: u8) -> u128 {
    // assert!(0 <= x); // always true as x is an unsigned value
    assert!(x <= n);
    assert!(1 <= n);
    assert!(n <= 25);
    
    (x as u128) * factorial((n as u128) - 1)
}

fn main() {
    let result = foo(24, 25);
    println!("{result}");
}
2 Likes

You got exactly the correct result, under the restrictions of the numbers that are representable in f64.

The float literal `0.28_f64` represents a value of 
 exactly 1261007895663739⁄4503599627370496
     because 2.79999999999999971e-1 is too small
 compared to 2.80000000000000027e-1
         and 2.80000000000000082e-1 is too big

The float literal `2244098675250432000000_f64` represents a value of 
 exactly 2244098675250432049152⁄1
     because 2.24409867525043179e21 is too small
 compared to 2.24409867525043205e21
         and 2.24409867525043231e21 is too big

The product is exactly 2698732517375308729890581677353⁄4294967296
 which gets rounded to 628347629070121041920⁄1
     because 6.28347629070120911e20 is too small
 compared to 6.28347629070121042e20
         and 6.28347629070121173e20 is too big

Thus why you see 6.28347629070121e20 from debug
             and 628347629070121000000 from display

Playground code I used to generate this explanation: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2024&gist=275a52a80b0fb9c91ee583fb7cd0138a

6 Likes

Tangent: TIL about the automatic super/subscript around U+2044 FRACTION SLASH -- nice!

2 Likes