Strategies for converting a floating point literal to the correct precision

dhfi25 · October 8, 2019, 5:07am

What are some reasonable strategies for converting a floating point literal into the correct precision within a routine?

As an example, consider the following code using f128

// External dependencies
use ::f128::f128;     

fn main() {                    
    // Inject a float into f128      
    let x : f128 = 0.45.into();      
    let y = f128::parse("0.45").unwrap();                   
                      
    // Print the value
    println!("x: {:?}",x);
    println!("y: {:?}",y);          
}

After running we receive:

x: 0.450000000000000011102230246251565404
y: 0.45000000000000000000000000000000001

Basically, we lose precision with using the straightforward into() function. In order to obtain the correct precision, we needed to convert from a string.

More generally, I run into this a fair amount when writing numerical code that's general to floating point types. Basically, a function of the form fn foo <T> (x : T) -> T. Often, I have some pretty simple numbers, literals, that I need to use as parameters in an algorithm. I'd like to have that literal translated to the correct precision regardless if we use f32, f64, f128, or whatever as our type parameter. Outside of converting that number from a string, which is pretty slow when done often, is there a good strategy for injecting such constants?

daboross · October 8, 2019, 5:34am

In this particular situation, what about using the f128::f128! macro?

If I understand it correctly, looks like this will parse a number at compile time and insert code with the right byte values to create the f128. This is a common strategy if you have something that can be parsed (and possibly validated) at compile time, and there are advantages to doing that over doing it at runtime.

For the general case, I'm not sure I have any good advice. Maybe the best way would be to create the literal with the highest-precision (f128), and always convert "downwards" from there?

dhfi25 · October 8, 2019, 6:14am

Ah, cool! I didn't know about the f128! macro. In case anyone else is curious about working code, it requires some features to work

// Needed for f128 macro
#![feature(proc_macro_hygiene)]

// External dependencies
use ::f128::{f128,f128_inner};

fn main() {
    // Inject a float into f128
    let x : f128 = 0.45.into();
    let y = f128::parse("0.45").unwrap();
    let z = f128!(0.45);

    // Print the value
    println!("x: {:?}",x);
    println!("y: {:?}",y);
    println!("z: {:?}",z);
}

which gives

x: 0.450000000000000011102230246251565404
y: 0.45000000000000000000000000000000001
z: 0.45000000000000000000000000000000001

As far as the general case, I in essence do you what you suggested now. Basically, I create the literal as an f64 and then downcast it to f32 when that code runs. It works, but I was hoping for something better in the case that f80 or f128 gets implemented more natively. Though, your suggestion would work more reliably for now at the cost of requiring nightly.

If anyone else has a better suggestion, I'm open to it. Thanks for the pointer to the macro above.

pcpthm · October 8, 2019, 7:15am

Casting from a higher-precision literal to a lower precision number doesn't always produce the correctly rounded result. The phenomenon is known as double rounding https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=ab79e8720125062d972f6193fb7cf78f.

dhfi25 · October 8, 2019, 8:36am

Well, darn. Nice example. What's the correct way to accomplish this, then?

pcpthm · October 8, 2019, 10:42am

Other than string parsing, representing the value as the ratio of two integers (or some value that can be exactly represented) can be used (though not always applicable).

fn generic<T: Div<Output = Self> + From<i16>>() -> T {
    // 0.45
    T::from(45) / T::from(100)
}

system · January 6, 2020, 10:42am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Format f32 with correct precision	7	1529	October 23, 2022
How to convert user input String to Floating point and loop error message prompt help	3	467	August 16, 2023
Rounding numbers help	9	2713	January 26, 2021
F32 printing options for accuracy? help	5	1219	January 12, 2023
Conversion from f32 to i64 help	6	1267	January 31, 2022

Strategies for converting a floating point literal to the correct precision

Related Topics