Why is the additive identity of floats -0.0?

I think I’d prefer +0 to be the additive identity even if that means x - x must be -0. I also dislike how +0 == -0, distinguishable numeric values should not be equal.

But combined, these two things together mean that 5 - 5 == 0 is false (because the LHS is -0), which also feels wrong.

2 Likes

I have. It's not that weird when mapping linear systems to matrix equations that sometimes a term wants to be infinite with the correct sign so that when you plug it in and solve everything falls out. This can include multiple layers of inversion. I encountered it in steady state circuit analysis. IEEE 754 mostly works great for modeling physical systems.

7 Likes

Thanks for the example. I would like to see that code. I do believe the signed infinities are useful, I am skeptical about the usefulness of "multiple layers of inversion" with these infinities.

I was thinking about a better design of floating point and I came up with this:

  • get rid of NaN
  • get rid of infinity
  • only one 0
  • saturate to f32::MAX on overflow
  • saturate to f32::MIN on underflow! not to 0

The last point two points make sense because you would get a smaller multiplicative error that way rather than going all the way to infinity or 0. This would also basically preserve all the benefits of signed zero without the associated weirdness, and you would never have to deal with infinities or NaNs.

So that f32::MAX * f32::MAX == f32::MAX without any sign that something's gone wrong by a lot of orders of magnitude?

Your design idea sound very similar to posits, right?

Well, today you have the "overflow" and "underflow" flags that you can potentially check after the calculation, you could still have that.

Today, if you don't check the flag, you can also easily not notice that something has gone wrong due to overflow. For example: 1.0 + f32::MAX / (2.0 * f32::MAX) will give you 1.0 instead of 1.5, with the overflow flag being the only sign that something has gone wrong.

Posits in their latest version (unum III) have NaN (renamed NaR) which I would get rid of, and have dynamic precision so that's different. They only have one 0 rather than two 0s, and the saturation in case of overflow and underflow is same as what I propose, so there is some similarity.

Wow, the relevant change looks like it’s remarkably recent..

Can we do just do this!? Just changing (kind of breaking, even…) the behavior of reasonable code like this?

fn report_length(parts: impl IntoIterator<Item = f64>) {
    println!("overall length is {} cm", parts.into_iter().sum::<f64>());
}

fn main() {
    report_length([1.0, 2.5]);
    report_length([1.0]);
    report_length([]);
}
overall length is 3.5 cm
overall length is 1 cm
overall length is -0 cm

Really, the only reason why -0.0 + 0.0 and 0.0 + -0.0 are defined the way they are defined is note because “-0 is deemed the preferred neutral element”; no the principle is rather “when in doubt, choose +0 over -0”. Making a sum of an empty list return -0 goes pretty much exactly against this principle!

Regarding neutrality, either of +0 or -0 are completely fine, anyway! This is because -0.0 == 0.0 is true! In most ways, they are supposed to be the same thing.


I’m also surprised how little discussion such a change needs… and I’m also trying to find any prior art on this new behavior. I’ve tested Haskell, Python, and Javascript just now and they all do it the way Rust used to do it (producing +0.0 even for non-empty lists of -0.0 values).


I could probably buy the argument that the sum of [-0.0] or [-0.0, -0.0] should keep stay updated to the new behavior of producing -0.0, following some logic of summing [a] should be like a itself and [a, b] like doing a + b. Though even that’s a change compared to all prior art I could find, so it’s at least reasonable to discuss the idea of keeping it like it used to be, too.

6 Likes

Would vfredusum qualify? That's how hardware behaves, after all.

But these languages are not known to be speed daemons, are they? And they don't follow the rules that hardware sets.

Rust also deviates, in some cases (e.g. with division), but usually there needs to be a reason for that.

Would you really prefer underflow — error because the magnitude became too negative that the value can't be distinguished from zero — to become a massive negative value instead of zero? That seems much more illogical than IEEE behavior, especially if you're arguing that negative overflow should become a massive positive value.

(People use "integer underflow" to mean negative overflow, but that isn't really a thing.)

I suppose you can argue for a representation with only two special reserved values, where the largest positive representable value is reserved for E_TOO_LARGE and the largest negative representable value for E_TOO_SMALL, and sorting those values as if they weren't reserved, but as far as arithmetic goes, they'd need to be special cases still.

I presume they mistook MIN for MIN_POSITIVE - the smallest non-zero value.

I don't hate it, I guess? But 0 meaning "any non-negative value less than can be represented" and -0 meaning the negation is relatively clean WRT to their relation to infinite values.

Ah yes, I meant f32::MIN_POSITIVE on underflow, not f32::MIN obviously.