Why does floating point multiplication not work?

rustaceanvim · March 3, 2025, 4:47pm

Why do I get a wrong result when trying to multiply two floating point numbers?

println!("res: {}", 0.28_f64 * 2244098675250432000000_f64);
// res: 628347629070121000000

I expected the result to be "628347629070120960000".

Is this number too large for the f64 data type, if yes, what are the alternatives?

kpreid · March 3, 2025, 5:06pm

In decimal terms, f64 only guarantees 15 digits of precision. That's what you got.

The better data type depends on what you need. You could get a software f128 implementation, or arbitrary-precision decimal numbers. Or, there might be a solution based on “fixed-point” arithmetic (appropriately scaled integers).

Where do these numbers come from? What do they represent? What are their possible ranges, and how much precision do they have (e.g. will the number that is 0.28 here always have two fractional digits and no more)? How much error is acceptable?

Answer those questions and we can recommend a better solution.

derspiny · March 3, 2025, 5:07pm

One input value is too large to precisely capture - it's 70 bits wide, and uses enough of those bits to exceed the 52 bit manitssa of an f64. It gets rounded, to 2244098658322444451840 (if I've done the math right, anyways), stored as 1.9008255004882812 * 2.pow(70) (precisely).

The other input value is also not precisely captured, because it isn't representable as a fraction whose denominator is a power of two. It also gets rounded, this time to 1.12 * 2.pow(-2) (also precisely), or 0.280000000000000026645.

So, right out of the gate, you're multiplying a value slightly smaller than your original input by a value slightly larger than your input. The results will not be exactly what the arithmetic product of the two input numbers would be. Whether the resulting error is acceptable or not is an application design question; the floating-point result is within a small fraction of a percent of the arithmetic result, so for engineering purposes it might be fine, but for doing number theory (where precise results down to the integer matter), it would not be.

jonh · March 3, 2025, 5:27pm

With a small update to rust 1.86

assert!(628347629070120960000_f64 <= (0.28_f64 * 2244098675250432000000_f64).next_up());
assert!(628347629070120960000_f64 >= (0.28_f64 * 2244098675250432000000_f64).next_down());

No idea if it holds for any single multiplication but working here.

cuviper · March 3, 2025, 6:49pm

On the playground I get:

u128: 2244098675250432000000
 f32: 2244098658322444451840
 f64: 2244098675250432049152

You can also see this on Float Exposed as a C float or double.

jwodder · March 3, 2025, 7:34pm

Please read "What Every Programmer Should Know About Floating-Point Arithmetic".

rustaceanvim · March 4, 2025, 7:00am

The example provides is only an example.
The formula could be expressed like this x/n * n! where the following rules apply:
x >= 0 && x <= n
n >= 1 && n <= 25
In the example above x = 7 and n = 25.
I need to find a solution with no data loss, as even the smallest error will make the result unusable.

ZiCog · March 4, 2025, 7:40am

As it stands I don't see how you can ever get zero data loss. For example when x = 1.0 and n = 3.0 the x/n requires an infinite number of bits to represent.

I'd start by rearranging things into x * (n - 1)!. Then at least your calculation can be done with integers an give an exact result.

I think i128 is big enough.

SReichelt · March 4, 2025, 8:18am

You might be looking for a rational numbers library. (Disclaimer: I've never used the linked crate.)

Jesper · March 4, 2025, 1:00pm

Depending on how big your numbers are, you could use the rust_decimal crate (96-bit fixed point numbers - enough to handle your example).

use rust_decimal::prelude::*;
use rust_decimal_macros::dec;

fn main() {
    let a = dec!(0.28);
    let b = dec!(2244098675250432000000);
    assert_eq!(a * b, dec!(628347629070120960000))
}

derspiny · March 4, 2025, 6:04pm

Right off the bat, I would simplify that to x * (n - 1)!. The division and subsequent multiplication will eat a lot of precision when n is significantly larger than x.

If x and n are integers, then the result is also an integer. In that case, I would not use floating point arithmetic at all; it's not designed for implementing integer arithmetic precisely at all scales. Given your constraints, the largest possible output is (x = 24) * ((n = 25) - 1)!, or 24 * 24!, which is 14890761641597746544640000 - about an 84-bit number. This, and all of the intermediate products, are representable as a u128.

That gives this program:

fn factorial(n: u128) -> u128 {
    let mut prod = 1;
    for v in (1..=n) {
        prod *= v;
    }
    prod
}

fn foo(x: u8, n: u8) -> u128 {
    // assert!(0 <= x); // always true as x is an unsigned value
    assert!(x <= n);
    assert!(1 <= n);
    assert!(n <= 25);
    
    (x as u128) * factorial((n as u128) - 1)
}

fn main() {
    let result = foo(24, 25);
    println!("{result}");
}

scottmcm · March 20, 2025, 12:45am

You got exactly the correct result, under the restrictions of the numbers that are representable in f64.

The float literal `0.28_f64` represents a value of 
 exactly 1261007895663739⁄4503599627370496
     because 2.79999999999999971e-1 is too small
 compared to 2.80000000000000027e-1
         and 2.80000000000000082e-1 is too big

The float literal `2244098675250432000000_f64` represents a value of 
 exactly 2244098675250432049152⁄1
     because 2.24409867525043179e21 is too small
 compared to 2.24409867525043205e21
         and 2.24409867525043231e21 is too big

The product is exactly 2698732517375308729890581677353⁄4294967296
 which gets rounded to 628347629070121041920⁄1
     because 6.28347629070120911e20 is too small
 compared to 6.28347629070121042e20
         and 6.28347629070121173e20 is too big

Thus why you see 6.28347629070121e20 from debug
             and 628347629070121000000 from display

Playground code I used to generate this explanation: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2024&gist=275a52a80b0fb9c91ee583fb7cd0138a

cuviper · March 21, 2025, 5:24pm

Tangent: TIL about the automatic super/subscript around U+2044 FRACTION SLASH -- nice!

system · June 19, 2025, 5:24pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Bug in f64 when given a large integer within precision?	11	448	December 12, 2024
Create correct result with operation over floats help	7	466	May 15, 2021
Conversion from f32 to i64 help	6	2145	January 31, 2022
Is there a thing like a flexible number data type? help	6	621	September 24, 2023
Help with precision using astro_float::BigFloat help	6	263	December 2, 2024

Why does floating point multiplication not work?

Related topics