Hello,
What is the range of the f32
and f64
types?
Thank you.
Hello,
What is the range of the f32
and f64
types?
Thank you.
f32::MIN
to f32::MAX
and similar for f64
... if you don't include NEG_INFINITY
nor INFINITY
(and NAN
sort-of).
More generally as the docs say, these are IEEE754 conformant types. If you search the web for "IEEE754" you will find more information on the topic than you ever wanted to know, probably. Here are some specific recommendations to learn more about them that have been made in the past.
For most applications, the range of both floating-point types is effectively infinite; the more interesting property is usually their precision.
To a first approximation, floating point types store numbers in scientific notation†, e.g. 1.2345×10³; the precision is how many digits can be represented in the first part (the "mantissa", 1.2345 here). As you do a sequence of calculations, most intermediate results can't be expressed exactly in this form, so small errors will creep in when the computer rounds these results to the closest number that can be expressed exactly— f32
calculates with about 7 significant figures, and f64
with about 16.
† In reality, it's all powers of 2 instead of 10, but all of the principles are the same.
Importantly, you should be aware that floating point types represent a fixed amount of precision for a number in scientific notation: for example a f64 is one sign bit, 10 exponent bits, then 53 "mantissa" bits, where a mantissa is essentially the bit in scientific notation before the "× 10":
value = mantissa × 2 ** exponent
There's quite a few extra details I'm going to handwaved here, but the short version is you're basically never going to run out of range in a realistic situation with floating point: even f32::MAX
is 10**38, a number larger than the number of seconds since the big bang, stars in the universe, and lots of other stupidly big values. And f64 is even bigger!
The trick is that f32 is only accurate to about one in 10 million parts, eg if you have a value of 1 billion, it's (not quite) accurate to the nearest 100. f64 is accurate to the rather absurd 2*10**-16, or one in 50 pentillion.
Unfortunately, this is where it starts getting really complicated, and you start bleeding from the eyes reading articles about numerical stability. The TLDR is use f32 until something goes wonky, then bump to f64, and if it's still off, give up and become a monk.
Of course @2e71828 beats me to it. I don't know why I bother writing on my phone!
One of the most common mistakes along these lines is attempting to use a floating-point variable as an accumulator. At some point, the collected value gets big enough that the increment you're adding doesn't actually change anything.
For example, 1 year ≈ 3×107 seconds, which is around the precision of an f32
. If you tried to keep track of time by adding 1.0
to an f32
clock every second, the clock will stop after about a year, and it will start showing odd behavior significantly before that.
Given the above, I'd add a suggestion to consider using an integer type in place of floating-point if the primary operations are addition & subtraction.
Hello,
Thank you so much for your reply.
Normally, the ranges of the f32
and f64
are the same?
No, both the range and the precision differ. See IEEE 754.
Edit: Unless you count positive and negative infinity as part of the range. Then both have the same range, of course. (because both f32
and f64
go from minus infinity to plus infinity)
Hello,
Thank you so much for your reply.
What is the range of the f64
?
The 11 bit width of the exponent allows the representation of numbers between 10−308 and 10308, with full 15–17 decimal digits precision. By compromising precision, the subnormal representation allows even smaller values up to about 5 × 10−324.
Source: Wikipedia on double precision floats
Note that for numbers greater than 253 (which is less than 1016), not all integer values can be represented (i.e. precision is less than 1
).
I'm curious what possible need you have to know what the range is of an f64, even an f32 can represent effectively any realistic real world value at a realistically necessary precision, f64 is more about avoiding losing precision for intermediate values.
If you need to represent very large numbers, for mathematical or cryptography related reasons, for example, then you're better off using an arbitrary precision value like num-bigint — Rust math library // Lib.rs
If you're just curious, that's fine I suppose, though the documentation also has all this information, which would be a bit faster, of nothing else!