Why is the additive identity of floats -0.0?

I was surprised to find that calling Iterator::sum on an empty vec of floats returns -0.0, negative zero.

According to the documentation, this is because it's the "additive identity":

An empty iterator returns the additive identity (“zero”) of the type, which is 0 for integers and -0.0 for floats.

Looking through the source code I see that it's a hard-coded constant, which means it's a deliberate choice, not a technical implication. It seems like a puzzling choice to me though, as I can't think of a single reason to ever want this result.

Therefore I have two questions:

  1. Why was -0.0 chosen as the "additive identity" for floats?
  2. Is there a simple and low cost way to convert -0.0 to 0.0 without negating any other value?

Playground example

3 Likes

Found this explanation, which—never having read the standard—I'd consider likely to be the answer to why -0.0 is the additive identity for floats and 0.0 isn't:

Of course it's the fault of an edge-case:

Either of -0.0f and +0.0f can be additive identities for values, except zero. When the value is zero, complications arise due to the way the standard has been defined for adding either of these zero representations to each other.

12 Likes

So to summarize: it's so that when you call [-0.0f64].iter().sum::<f64>(), you get -0.0. If sum started from +0.0, you'd get +0.0 as the result because +0.0 + (-0.0) is +0.0.

Personal opinion: I think negative zero is a horrible misfeature of the floating point standard.

10 Likes

If it's defined like that then maybe it would be good to mention, somewhere, that if you call C module that calls C standard function fesetround then correct result is no longer guaranteed?

Could be a surprise to someone…

That's stated here:

Note that modifying the masking flags, rounding mode, or denormals-are-zero mode flags leads to immediate Undefined Behavior

2 Likes

But that's not the most common way of achieving that and not even the most obvious.

Looking into documentation for obscure x86-only function is not something I would do if I'm dealing with ARM or MIPS.

Using standard-provided function from C would probably be more common.

We may pretend that C doesn't exist, but in practice Rust would use C library for decades to come…

Indeed, because sum is implemented as a fold. But that's an implementation detail. Intuitively, I would expect sum to sum only the elements, not including the additive identity. And it's not clear to me that in the case of an empty sequence, the default should be the additive identity, especially when doing so exposes this quirk of the floating point standard. It makes sum significantly less useful, and the only gain seems to be a "cleaner" implementation.

As for how to avoid this issue with sum, I suppose one could always just use .fold(0.0, |a, b| a + b) instead, since there's 0% probability that distinguishing -0.0 and 0.0 will actually ever matter, unless one is working in an esoteric niche involving imaginary numbers or something.

What else could it be?

Additive identity makes sense (although it depends on rounding mode), anything else would be strange (and pretty much arbitrary) decision. Not impossible, but… why?

4 Likes

Well, the standard says

When neither the inputs nor result are NaN, the sign of a product or quotient is the exclusive OR of the operands’ signs; the sign of a sum, or of a difference x − y regarded as a sum x + (−y), differs from at most one of the addends’ signs; and […]. These rules shall apply even when operands or results are zero or infinite.

When the sum of two operands with opposite signs (or the difference of two operands with like signs) is exactly zero, the sign of that sum (or difference) shall be +0 under all rounding-direction attributes except roundTowardNegative; under that attribute, the sign of an exact zero sum (or difference) shall be −0. However, under all rounding-direction attributes, when x is zero, x + x and x − (−x) have the sign of x.

So +0 + +0 ⇒ +0 and -0 + -0 ⇒ -0, but +0 + -0 ⇒ +0 and -0 + +0 ⇒ +0.

That means that |x| 0.0 + x is not identity because putting in -0 gives you +0, and thus +0 is not an additive identity. But |x| -0.0 + x does give you -0 if you put in -0, as well as +0 if you put in +0, making it an identity function, and -0 the additive identity.

You can also see this clearly in godbolt, as LLVM knows which is actually an identity, and optimizes based on that fact: https://rust.godbolt.org/z/saaWqchqx.

I mean, this part is really clear to me. You're using the addition monoid, so of course the result of the sum of an empty sequence should be the identity from that monoid. That's the only way you get x.sum() and let (a, b) = x.split(i); a.sum() + b.sum() to give you the same result.

And that's not just a theoretical property. It's exactly what you want so that you can sum a VecDeque by summing the two as_slices() and adding the result, for example.

(Well, except that floating-point addition isn't technically a monoid because it's non-associative, but that's a whole different problem. Sum::sum, at least, at the trait level should obviously be defined to return the additive identity element when passed an empty iterator. TBH people probably would be more annoyed if we said you can't .sum() floats -- they certainly complain plenty about the more important difference that's PartialOrd!)

11 Likes

That's a good point. I agree. It's hard to be sane in a world built on a crazy standard.

1 Like

I know that people love to hate on IEEE 754, but doing something materially better is hard.

Even these "weird" things like -0 being the additive identity are just consequences of other things that would probably be considered more weird and would need to be less consistent in order to make +0 be the additive identity. For example, take these rules:

  • +0 + +0 ⇒ +0? Yes, that makes sense.
  • -0 + -0 ⇒ -0? Yes, that makes sense.
  • +5 - +5 ⇒ +0? Yes, that makes sense.
  • +0 - +0 ⇒ +0? Yes, that makes sense.
  • x + (-y) ⇔ x - y ⇔ (-y) + x? Yes, that's what I expect from math.

They all seem entirely reasonable, but they force the additive identity to be -0.

13 Likes

All these properties would also be true if -0 was simply the same thing as +0 (I didn't want to say "equal to" because they are already "equal"), as is the case in all of mathematics outside of IEEE 754.

1 Like

Unfortunately, if -0 and +0 become the same thing, you get oddities like the sign of the result disagreeing with the sign of the operands:

fn main() {
    let a = -1e-100f64;
    let b = 1e300f64;
    let c: f64 = a / b; // should c's sign be positive or negative?
    let signum = c.signum();
    
    println!("{c} {signum}");
}

This program prints -0 -1, and indeed the algebraic result of dividing a tiny negative number by a huge positive number should be negative. If you squash negative zero to positive zero, though, the sign of the result disagrees with that.

It's fiendishly hard to devise a real-number scheme for computers that only has anomalies you can tolerate. IEEE floating point math has plenty of anomalies, but they're mostly reasonable within the intentions of the spec.

Zero doesn't just represent exactly zero. In floating point arithmetic, zero represents the interval between zero and the smallest representable nonzero value.

7 Likes

Well unless it rounds to 0, in which case I'd expect it to be 0 rather than negative (if I hadn't heard of IEEE-754). We already accept floating point rounds sometimes, and that tiny numbers underflow to 0, so rounding to 0 with no sign in those cases would be perfectly consistent with those rules.

The argument that it should be negative because the exact answer is negative is like saying: "we know that mathematically 1 - 1e-100 is smaller than 1, so it should be less than 1.0".

I know that's the rationale, but it's inconsistent with the rest of IEEE-754 where a number represents a single number rather than an interval.

I know there might be some rare cases where the sign of negative 0 might be useful in some calculation, but I've dealt with floating point calculations for years and not once have I thought "thank you IEEE that -0 behaves differently from +0, it's exactly what I need here". It's always been an annoyance.

2 Likes

It seems to me that a hypothetical floating point format that used two-complement for the mantisa rather than a separate sign bit would avoid this issue with no major downsides.

Of course, this is extremely unlikely to gain any traction at this point: it offers only small benefits and the standard is very entrenched at this point. Too much inertia.

But that's natural: when difference between +0.0 and -0.0 saves your hide… you hardly notice that… but when said difference leads to problems – you notice it just fine.

Your words would have made sense if you also used some other CPU, where +0.0 and -0.0 are identical, and that wouldn't have ever been a problem. But I'm not sure such CPU even exist, thus obviously you haven't used it.

The major loss would be that today 1/(1/-∞) is -∞, whereas in a world without signed zeros it would be +∞.

And maybe you'd say that that's fine, but then you're in Projectively extended real line - Wikipedia instead of Extended real number line - Wikipedia, and you end up losing things like ∞ having a meaningful ordering with finite numbers.

Arguably what you want is to have something like Hyperreal number - Wikipedia, so that 1/0 is actually NAN, but you have 1/±ε = ±ω. But that adds its own problems, because if you switch to that system then you still have the question of which of {-ε, 0, +ε} you should get for "simple" things like 5.0 - 5.0, and we're basically back to the signed zeroes questions again.


As annoying as signed zeros are, they really do help for things like cut points.

I'm glad that

  • atan2(1/2, -1), atan2(1/4, -1), atan2(1/8, -1), … → π, and
  • atan2(-1/2, -1), atan2(-1/4, -1), atan2(-1/8, -1), … → -π

even when they hit zero from underflow.

5 Likes

I've never written or seen real code relying on this.

Typically when you're dividing by 0 it's either a bug or you don't particularly care whether you get +∞ or -∞, especially since typically the "true" number that's approximated by 0 might have been a tiny number that changed sign due to rounding.

I can probably imagine some hypothetical scenarios where this matters, but they would be very specialized and I'd think in those scenarios you probably want to use something like true interval arithmetic rather than regular floating point numbers.

Do you have an example of real code that relies on this?

I have used atan2 plenty of times, and it would never matter whether I get +π or -π for anything infinitesimally close to the x axis.

How common is the problem that -0 solves?

1 Like

Thanks for that explanation. The links were very interesting. To me those things would indeed matter less.

Floating point numbers aren't really those number line you mentioned though:

  • They aren't reals, but describe a subset of the rational numbers. They are also cut off in the magnitudes they can represent (finite precision and finite range). Fine, integers in computing are also finite size. And there are bignum versions of both if you really care.
  • They have NaNs which don't exist on any number line. And strange things like signaling vs quiet NaNs.

Ideally I think treating the type as something akin to Result<ValidFiniteFloat, FloatError> where

enum FloatError {
   Nan, 
   Inf, 
   NegInf
} 

would be interesting. Have a single (or a very small set of) nans. And treat them as the error conditions they usually are. Maybe this falls flat, but it seems like an interesting thought experiment at least.

And the ship on this has obviously sailed. But maybe as a NonZero like wrapper / pattern types you could have a FiniteFloat<T> like that.

I don't think that's right. The infinities are not nearly as much of a problem as NAN (they admit Ord, for example), and if overflow to infinity is a problem then underflow to zero should also be in this Error enum, since they're essentially equivalent.

As my usual example of how infinities inline as values are really helpful, if you're calculating a geometric mean, it's pretty easy to end up with infinite error, like

std::iter::repeat_n(2.0_f32, 1000).product::<f32>().powf(1.0/1000.0) = inf

So the typical way to avoid that is to use logarithms, as

(std::iter::repeat_n(2.0, 1000).map(f32::ln).sum::<f32>() / 1000.0).exp() = 2.0000134

is already way better, and a better summation or averaging algorithm would improve it more.

(https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=d433d835c5a2c1b22ee3914794e26cad)

But then the ln(0) ⇒ -∞ and exp(-∞) ⇒ +0 behaviour of having the infinities available as "normal" values is really helpful in that it naturally works if you have a zero in the input to the geometric mean calculation.

3 Likes