What is the reason of the exact value of f64::MIN_POSITIVE ? It is a very small number, but it seems to me that if I do f64::MIN_POSITIVE / 1000 I still get a positive number. Is the word "normal" carrying some heavy load here, or is the constant just wrong (that sounds very unlikely to me)?

I've seen other claims that the smallest "double" (i.e. 64-bit floating point number in something that is reasonably close to IEEE standard) is 5e-324, which is a lot smaller than the rust value.

The f64::MIN_POSITIVE constant is the smallest "normal" positive number a f64 can represent, where a "normal" number is any float that can be expressed without having a leading zero in the significand (sometimes also called the "mantissa").

The f64::is_subnormal() method includes this example to show how it works in relation to Rust code.

let min = f64::MIN_POSITIVE; // 2.2250738585072014e-308_f64
let max = f64::MAX;
let lower_than_min = 1.0e-308_f64;
let zero = 0.0_f64;
assert!(!min.is_subnormal());
assert!(!max.is_subnormal());
assert!(!zero.is_subnormal());
assert!(!f64::NAN.is_subnormal());
assert!(!f64::INFINITY.is_subnormal());
// Values between `0` and `min` are Subnormal.
assert!(lower_than_min.is_subnormal());

Ok, so given f64::MIN_POSITIVE, and some fiddling around, it seems to me that the minimum expressible positive value in an f64 is the smallest possible mantissa multiplied by the smallest normal number, or 2.0f64.powi(1 - (f64::MANTISSA_DIGITS as i32)) * f64::MIN_POSITIVE. This is not a normal number, but it is expressible, and it is > 0.

Is thera an existing constant for this? Or should I just calculate is as above?

This has the feel of an XY problem. The reason subnormals exist is that there are certain algorithms which are only numerically stable if intermediate results are represented with loss of precision, rather than being allowed to underflow to 0.

You very rarely want to directly use subnormals; they're slower on most hardware than normals, and they (by definition) lose precision as compared to normals, since some of the mantissa digits are effectively being reused to extend the exponent.

I think that might have been the rationale, but it's a dubious one. Once you get in the underflow regime, some algorithms become unstable no matter what, and subnormals only complicate the picture. One could argue the same way that gradual overflow could also be useful for some algorithms, and yet for overflow there is a hard cutoff. IEEE-754 is a mess.

One reason subnormals exist is that in the normal mantissa+exponent regime there's no way to represent zero.

f16 is SEEEEEMMMMMMMMMMM, representing 1.MMMMMMMMMMM×2^{EEEEE-15}. But if that was always what it meant, then f16::from_bits(0) would be 0.000030517578125, not zero.

So you need some special case. And you have the choice of saying "well, only (the two) zeros are special" or "everything with EEEEE = 00000 is special".

Thus subnormals make the ULP size not work differently for zero. Is that worth it overall? I don't know.

But in general, there's lots of effort put into making things work well around zero (leading to things like exp_m1) so it doesn't seem to surprising that this one was picked.

There's an underlying reason for this - IEEE 754 has two conflicting goals to support:

If you know how to do numerical analysis of your algorithms, you must be able to verify that your algorithm is stable, and calculate the domain of your algorithm given the IEEE 754 types you support. In this case, it's considered OK to make the analysis a little harder, because you're already doing hard work here.

Naïve users of floating point (those who don't or can't do numerical analysis) should, wherever possible, get either correct results or obvious errors. This is the underlying reason for providing subnormals to get gradual underflow, but not "supernormals" to get gradual overflow - overflow is fairly obvious when it happens, and it's clear to a naïve programmer that you've hit infinity (because infinity is "sticky" in the sense that once you have an infinity, you can only get NaNs or other infinities via IEEE 754 arithmetic). Underflow is only obvious if you have trap on underflow behaviour enabled, which isn't the default - otherwise, something becomes zero when it should have been tiny.

It's that second group, the naïve users, who benefit from gradual underflow, because it means that their code works well enough even when it's just good luck that they underflowed their f64 by 10 bits but only needed 20 bits of mantissa precision and not good planning.

Well, that was an interesting rabbit hole! Thanks to everyone who answered.

I'm not really doing any real math at the moment, all I'm trying to do is implement the sass:math module in rsass, and in particular defining the new $min-number constant in a way that is compatible with dart sass. So for now I'll go with 2.0f64.powi(1 - (f64::MANTISSA_DIGITS as i32)) * f64::MIN_POSITIVE, but I'll add a comment about this beeing subnormal and that maybe just using f64::MIN_POSITIVE would be better.

I find this rationale weak, because subnormal numbers will only (partially) save you from losing all your precision if a very rare scenario: when the intermediate numbers in your calculation happen to fall in the subnormal range of between 1e-324 and 1e-308. That's a pretty small range of exponents. If your calculation is using numbers smaller than 1e-308, it is likely also using numbers smaller than 1e-324, so you're going to get truncated to 0 anyway and subnormals aren't going to save you.

According to William Kahan, one of the primary authors of the IEEE 754 standard, the main reason for the existence of subnormal floats is to fill in the "underflow gap", i.e. to hide the fact that the smallest normal floating-point number is much closer to the next larger float than it is to 0. (I'm using Julia for illustration because I'm much more fluent in that language, but everything it does is based on IEEE 754 and hence applies equally to Rust.)

julia> x = floatmin(Float64)
2.2250738585072014e-308
julia> nextfloat(x) - x
5.0e-324

This underflow gap is a problem because it means your rounding errors will be much larger if your result happens to be between 0 and floatmin() than if it is between floatmin() and 2*floatmin().
The underflow gap therefore makes it harder for experts to write correct code, and it may be a footgun for moderately experienced programmers.

Moreover, subnormal numbers have the added benefit of ensuring that x != y is equivalent to x - y == 0. This is not true without subnormals.

julia> # Standard IEEE 754 arithmetic:
x = floatmin()
y = nextfloat(x)
x - y
-5.0e-324 # <- Subnormal number
julia> # With subnormals disabled:
set_zero_subnormals(true)
x = floatmin()
y = nextfloat(x)
x - y
-0.0

Subnormals numbers therefore avoid footguns like these:

if x != y
z /= (x-y)
end
if z != z # Surely this can never happen. Right?
crash_and_burn()
end

The second if statement here is obviously somewhat artificial, but it is quite easy to inadvertently write equivalent code if you forget about NaNs. And even if you remember NaNs, you might think you are safe because if x, y and z are non-NaN in the beginning, you might expect that this is also true at the end. This is indeed correct, but it is only correct thanks to subnormals.

Yes, I understand this rationale, but I think if it helps then it only helps for some extremely specific algorithms. I would like to see an explicit example, rather than the rationale in the abstract.

Normally what floating point gives you is relative error guarantees, not absolute error, so it doesn't really matter whether the absolute error remains the same for subnormals, the relative error grows, and you're back to the same infinite relative error once you get below min_subnormal.

I would like to see a specific algorithm where subnormals save the day. I can imagine there is maybe some scenario where subnormals help, but I just don't think it's worth the hassle, and more often it just creates extra pain.

Apparently, according to the interesting article you quote, there was a heated discussion about it back in the day, so clearly it's not a clear-cut win at least.

This could have been solved by having underflow always round up to f64::MIN_POSITIVE rather than to 0. It makes sense to me in a way, because rounding to 0 always has infinite error on the log scale. It would also basically remove the rationale for the -0.0 and +0.0 weirdness.

Fair enough. I'm afraid I can't help you there, but I'm equally curious. If anyone reading this can help us out, I would love to hear!

The answer not being obvious doesn't mean the answer isn't clear. I'd say the fact that all major manufacturers accepted subnormals eventually is a strong indication that they must have some practical use.

Always rounding up to f64::MIN_POSITIVE leads to arbitrarily large relative errors if you let the input go to zero. On the other hand, rounding to zero has a relative error of at most 1.

Signed zeros definitely serve a practical purpose, namely they represent one-sided limits. This may be not that useful in real arithmetic, but it frequently comes in handy when dealing with branch cuts of complex functions. For example:

Again, I would like to see an example of actual code that relies on this. It's one thing to demonstrate a funny thing on a carefully crafted example, totally another thing to see how it's useful in an algorithm.

I can't imagine any code computing sqrt(-x - 0.0 i) and relying on getting to -i sqrt(x) rather than +i sqrt(x).

If the -0.0i is just hardcoded into the program, nobody would write it that way. Instead they would simply say directly what they really want to get, i.e. -i * sqrt(x).

If it's not hardcoded but the result of some calculation, then it's an approximation to some actual value that is very close to 0.0i, and there are no guarantees about which side of the line you're going to end up on due to rounding errors, so the code wouldn't be relying on it.

IEEE-754 tries to do many things at once -- in this case, trying to compute a precise limit rather than working with rational approximations which it normally does, but only for this one special kind of limit. f64 is not really the right tool if you want to calculate limits.

This example gives a false sense of security. Subnormals don't save you from other, very similar footguns if you assume floating point arithmetic doesn't round:

if x != y
z /= (x-y) / 2
end
if z != z # Surely this can never happen. Right?
crash_and_burn()
end

(Not sure why we're using some other language rather than Rust, but I am sticking to the same language to show how artificial and fragile that example was.)