Fast-posit: Software implementation of the Posit floating point format

Thanks for the detailed response!

Gotcha, I understand what you mean now. Yeah the wording is not great. I meant round_from = "the conversion prescribed by the standard", but that's not what is implied in the general documentation of the RoundFrom trait (even if the specific impls have more detailed comments).

Regarding uX, yes I agree that it is a mistake. I realise now the standard most likely refers to signed integers only? I understand the reasoning that since iX::MIN is a value unlike others (it's the only value whose negation overflows, and "removing" it from the domain means that iX is balanced around 0). Hence it is the least bad option if you want to have a reversible mapping from iX to Posit. But for unsigned integers that makes no sense. Regardless, the wording in the standard document is not clear.

Ideally, you could just make them either just inherent methods – or at least split it into 2 separate traits for the integral types and the IEEE floating-point type + quire conversions

That does make sense, thanks for the suggestion!

Ideally, we’ll be getting a properly functioning NonMin/…n/…

It's been on my wishlist for a long time! :slight_smile:

1 Like

Exactly. For one, I would be highly highly surprised if ±0f64 mapped to anything other than posit 0. For two, it's what's in the standard, and the crate aims to offer (a superset of) the functions described in the standard.

As for whether ±∞_f64 should be mapped to ±MAX, instead of NaR, is another question.

1 Like

seems to contradict

The standard you cited in your original post states:

In converting a float value to a posit value, all forms of infinity and NaN convert to NaR.

So there is no decision to be made if your crate actually is a superset of functions in the standard: you have no choice but to map said floats to NaR.

1 Like

a superset of

To be clear: the functions in the standard will always exist with the same semantics, but that doesn't mean there aren't alternatives / extra fns (e,g, round_from(i32) being the standard fn and try_from(i32) not).

1 Like

Oh man, how did we ever get any useful computing done all these decades when we haven't even got a good grasp on our numbers yet? :slight_smile:

2 Likes

I mean, it's been almost 40 years since Numerical Recipes - Wikipedia first came out. "We" absolutely know how to do useful computing with numbers -- even if lots of individuals don't.

3 Likes

In IEEE-754 it doesn't actually work for NaN: playground

If you exclude NaN, it also works for +0. For example: playground.

I wasn't using = as the equivalence relation defined by IEEE 754 (which is why I kept math notation and didn't use ==); I was using it as in true mathematical equality. Additionally that same argument would suggest that -0 is just as good as +0 when deciding which float Posit 0 should get mapped to since 3.0 - 3.0 == -0.0 as well.

The point I was trying to make is that it makes sense to either have no notion of "unsigned" 0 or to treat all floats that are IEEE-754 "equivalent" to +0/-0 as "unsigned 0". If one is forced to pick a single IEEE-754 value to assign "true" 0 though, then I find it as or more reasonable to pick -0 than +0 at the very least making +0 not the "only" reasonable choice.

1 Like

That was a bit tongue in cheek. My real point is that trying to maintain various algebraic identities exactly in IEEE-754 is a fool's errand, especially so for trying to maintain the distinction between -0 and +0. It's impossible. Trying to do it artificially only creates more problems. IEEE-754 was not designed with any such consistency in mind, and especially the -0 / +0 hack.

Clearly +0 is the "default" zero in IEEE-754. That's true for the results of arithmetic operations, and it's also true for integer-to-float conversion:

Integer zeros without signs are converted to +0.

BTW in Rust std [].iter().sum() gives -0.0 with the same reasoning. I think it's a mistake for the same reason. It's super confusing and doesn't achieve anything useful. I guess somebody wanted to make sure [-0.0].iter().sum() gives -0.0, but if we have to pick one I'd rather have both return +0.0 than both return -0.0 simply because [] is more common than [-0.0].

Agree to disagree I suppose. I think it's silly to distinguish between -0 and +0, and IEEE-754 essentially says such. At which point +0 is neither better nor worse than -0 when deciding what value unsigned 0 should get mapped to. Once you make arguments like "results of arithmetic operations" return positive 0, then you've henceforth decided to not use == as defined by IEEE-754 but instead true mathematical equality; from there, you then see that there are "arithmetic operations" that return negative 0 most notably the definition of additive identity. There are arithmetic operations and conversions that suggest positive 0 is better though; and as stated in the initial response, the point is essentially moot since the Posit Standard states Posit 0 gets mapped to positive 0. I think Rust made the correct choice of using -0[1] since it's the actual additive identity under the strictest definition of =; and at "worst", it's just as good as +0 which IEEE-754 says is equivalent anyway.


  1. Note I'm not considering the "breaking" change that Rust decided to do. I may be inclined to agree that +0 is "good enough" as the additive identity to not warrant a change to -0; however I believe -0 is the "better choice" when it's the original choice. ↩︎

1 Like

The IEEE-754 standard never explicitly "defines" -0 to be the additive identity. It's basically accidental. Arithmetic wasn't designed with the idea of making sure -0 is the neutral element. The only reason it's accidentally become the only neutral element is that when somebody thought "what should 0 + -0 be", they said "+0, obviously".

I agree that it was accidental, but that doesn't change my stance that -0 is the "better" initializer for summing. Again, I think the bigger issue is caring about -0 vs. +0; and if you do, then that seems like a bigger violation of IEEE-754. Since we must (or at least should) pick one though, then -0 has better algebraic properties as the additive identity in my opinion even if it wasn't the intention of the standard.

Of potential interest: Branch Cuts for Complex Elementary Functions or Much Ado About Nothing's Sign Bit written by William Kahan himself in 1986. Interesting snippet:

The IEEE standards 754 and p854 take a different approach.
They prescribe representations for both +0 and -0 but do not
distinguish between them during ordinary arithmetic operations,
so the ambiguity is benign. Rather than think of + 0 and -0 as
distinct numerical values, think of their sign bit as an auxili-
ary variable that conveys one bit of information (or misinforma-
tion) about any numerical variable that takes on 0 as its value.
Usually this information is irrelevant; the value of 3 + x is no
different for x:= +0 than for x := -0, and the same goes for
the functions signum (x) and sign(y,x) mentioned above. How-
ever, a few extraordinary arithmetic operations are affected by
zero's sign; for example 1/ (+0) = +∞ but 1/ (-0) = -∞. To re-
tain its usefulness, the sign bit must propagate through certain
arithmetic operations according to rules derived from continuity
considerations; for instance (-3)(+0):= -0, (-0)/(-5) =+0,
(-0) - (+0) = -0, etc. These rules are specified in the IEEE
standards along with the one rule that had to be chosen arbitra-
rily:
s - s := +0 for every string s representing a finite real number.
Consequently when t=s, but 0 ≠ t ≠ -∞, then s-t and t-s
both produce + 0 instead of opposite signs. (That is why, in
IEEE style arithmetic, s - t and - (t-s) are numerically equal
but not necessarily indistinguishable.)

The fact that s - s := +0 for all finite s was chosen "arbitrarily" while (-0) - (+0) = -0-((-0) - (+0)) = --0(+0) + (-0) = +0 was chosen purposefully "from continuity considerations" makes me feel more strongly that -0 is the better additive identity. From this, I wish IEEE-754 hadn't chosen "arbitrarily" to assign +0 to s - s but instead -0.

1 Like

This, for me, is not a convincing reason to have -0. Once you have underflow you lose all precision and no wonder you start getting wrong results. You should expect that if you have calculations that have discontinuity at 0. If you somehow keep getting correct results it might just lull you into a false sense of security.

-0 will not save you in many other underflow cases. The expression "x / 2 * 2 / x" will suddenly jump from 1 to 0 to NaN when you start underflowing. In some rare, carefully constructed cases like the one you show -0 will save you and allow you to get the right results despite underflow. But it's the exception rather than the rule.

To me that doesn't seem worth the trouble of adding a special value, having weird properties of ==, etc. It's like trying to maintain correct results after an i32 overflow by adding some strange special infinities to i32.

Now, the example might be a reasonable argument that flushing underflow to 0 is worse than saturating underflow to f64::MIN_POSITIVE. This wouldn't require any special values. For any x > 0, the relative difference going back from 0 to x is infinite whereas the relative difference when going back from f64::MIN_POSITIVE to x is at least finite. It's a much simpler solution than signed zeros. I think it would also satisfy all Kahan's branch cut arguments for signed zeros.

1 Like

I can't find the quote now (I think it was Joel Spolskey maybe?) but I heard once that computers are generally bad at three things: numbers, text, and time - so it's a bit unfortunate that's what we use them for! Specifically, IEEE 754, Unicode, and I guess any half realistic Date and Time library, show that these human concepts just don't map very well to hard, repeatable rules and logic.

Well this discussion certainly proves the point that signed zero raises more problems than it solves! What what corner cases it does "solve" aren't an issue in the first place if you don't underflow to 0, and only to ±MIN_POSITIVE.

Well, Numerical Recipes is, precisely, a collection of workarounds (incantations?) by practicing scientists and engineers to mitigate the limitations of floating point. And we can (obviously) do many useful things with IEEE floats despite all its flaws. The claim is only we can do better than a format that was stabilized, half by accident, in the 1970s, from historical design and economic constraints of a specific Intel co-processor.

And we can! Not only posits but other alternative formats do improve in objectively measurable ways. Which is only natural after nearly half a century of theoretical and engineering progress :slight_smile:

1 Like

Given that catastrophic cancellation by definition will exist in all fixed-width formats, aren't we going to need a collection just like this even if we move to a better format?

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.