# Total order for floats?

Who cares about orderable? They all are, even using more strict definition of "ordering", assuming the appropriate axioms.

And the question of appropriateness of these axioms if precisely what separates mathematicians into different camps.

But NaN was introduced to make it possible to fit ℝ̅ approximation into hardware! It's easy for the mathematician to say that adding `+∞` and `-∞` is “not defined”, but what the pieces of silicone is supposed to do if one attempts that? CPU which explodes when you attempt such operation is not much useful.

If you think about that you'll understand that much of the complexity (may be all of the complexity) which programming deals with is this dilemma when you only have a partial function in the domain space, but have to, somehow, map it into total function. Because exploding CPUs are not well received by general public. Freezing senators are fine, freezing CPUs are not.

2 Likes

Other solutions are possible. For example, `u32` addition also has out of domain values (overflow!). What we do is that in debug mode we panic, in release mode we take a semi-arbitrary value and continue. That would be one reasonable alternative to NaNs.

Another alternative would be to simply say that ∞ - ∞ = ∞ or something like that, on the theory that this is as good approximation as you can pick, and all our operations are approximate anyway. This would be a numerically stable choice, because that's the correct answer for some values "close" to infinity (in the appropriate topology), which is the definition of numerical stability.

2 Likes

My impression was to invent NaN to make the standard operations total. Also, floats are not reals, they don't prentend to be, they don't approximate them. They are finite (!) subset of rationals. Plus some extra values.
But then again, this:

1 Like

That's the next step. Note that most CPUs ignore division by zero (as usual it's `x86` that's weirdo, not other architectures), but some don't and Rust decided that it's important enough to detect.

I guess situation with floats NaN is different in a subtle way: most modern CPUs actually agree on what is supposed to happen when you do the impossible calculations and exposing precisely that to programmer looked like a sensible choice… but that made ordering non-total.

Well… everything in a computer is finite. The big question is whether auto-bignum (like in Python) makes some sense or not. Give that even python decided that it needs to limit it's auto-bignums I think what Rust did makes sense.

If you can write down a definition of real numbers (eg. as cauchy sequences or dedekind cuts) then you can write that definition in a theorem prover and start talking about specific real numbers. And then you can show that there's no function which takes two real numbers and returns `bool` such that the function models the concept of equality.

And yes, floats can be ordered (since they're finite sets) but only if you stop pretending that they represent numbers: eg. that `+` is addition, `*` is multiplication etc. But the whole point of floats is to treat them like numbers which is why Rust's float types implement `Add`, `Mul`, etc. Rust's floats don't implement `Eq` or `Ord` though and I'm saying that this is a good thing and would still be a good thing even if NaNs didn't exist. Giving users the ability to treat floats like numbers (add them, multiply them etc.) and also the ability to check whether two floats are equal enables them to conflate the thing being represented with the representation itself and then they end up shooting themselves in the foot with stuff like `0 < 0.4 + 0.2 - 0.6`. Note that you don't actually need `<` to sort a list of floats for the sake of computing the median - `min` and `max` suffice.

In my ideal language the mathematical operators would be required to obey the field axioms. There'd be a `RawFloat32` type which can be checked for equality but which only supports IEEE "addition", "multiplication" etc. as methods (eg. `x.add(y)`). Then there'd be abstract float-backed numeric types which do support the mathematical operators but where the field axioms are enforced by making it impossible to implement non-continuous functions on then. Specifically, you wouldn't be able to check them for equality/ordering but you also wouldn't be able to write numerically unstable code at all. And since these types would obey the field axioms the compiler would be free to use those axioms to perform optimizations (eg. `--ffast-math`). Then there'd be a way to mediate between these two layers with some kind of effects system (though I haven't figured out how that part would work yet).

Yes, it really is hard. That's why LLVM is moving to more poison, less UB, for examples. Integer divide by zero is incredibly annoying. It causes all kinds of problems for speculation, for example, because of that fault. Similarly, it's extremely difficult for a compiler to hoist it from a loop, again because of that signal.

Reall values that are consumed normally are much better than weird side-channels. See, for example, how `Result` is nice because it's a normal value, and exceptions can be a pain because they're a side-channel. And how languages with exceptions end up with things like ExceptionDispatchInfo.Capture(Exception) Method (System.Runtime.ExceptionServices) | Microsoft Learn to turn the weird side-channel thing into something you can actually pass around normally again.

Now, I actually do wish that Rust had a `NonNanF32` or similar as an easy way for Rust code to work in that land where possible -- especially for storage.

But in the hardware and in optimizations, NAN really is better than trapping on stuff.

6 Likes

I agree that trapping is bad, but there is a third option: best effort!

Floating point is already best effort, basically all floating point results are approximations, so no reason to error out on something like division by zero and stuff like that, because that 0 in the denominator might be the result of rounding errors. Well, actually 1 / 0 is already not an error, but I'd argue it would do no harm to extend that logic to all operations and return something that best approximates the operation in all cases.

I realize this is a wild idea (primarily because it's not what the hardware does), but this thread is already deep into wild speculation, so I allowed myself this.

Not any better than unwrapping `Option::None`, though?

I think the problem with NaN has an equivalence. If the program unwraps `None`, it results in a panic which is the only reasonable thing to do, because that operation is a bug. I could argue that when `NaN` is used in an arithmetic operator (or results from an arithmetic operation), it should result in a panic, because that operation is also a bug. And again, it's the only reasonable thing to do.

If we let `None` propagate and infect values when using them, it would just be the nullability problem.

2 Likes

Hmm...

It's always about the optimisers. Be they in the hardware or software. All about performance.

My feeling is that if program correctness were our top priority instead, then an erroneous operation, int overflow, creating a NaN, whatever would immediately hit the emergency button, halt the program. Continuing in an erroneous state would not be acceptable.

At least I could enable that for debugging, get a core dump, and see what happened. Frankly I think a hard stop in release/production code would be preferable. If we care about correctness.

Every processor I have ever programmed in assembler had an overflow flag. The processor is trying to tell us the error of our ways.

Can't we give processors an emergency stop button without impacting any optimisations that may be done? After all optimised instruction sequences are just instruction sequences. The processor does not know or care.

And then you would be able to discuss things related that specific subset of real numbers set.

Which would tell us nothing about how real numbers work. Why do you think there are both Gödel's completeness theorem and Gödel's incompleteness theorems? How could both exist? Please answer than then we can discuss other things.

They already have that ability with integers where `X + 1 < Y + 1` doesn't imply `X < Y`.

Which would make your “perfect language” mostly useless because most actual computations hit various corner cases of floats all the time.

Try to imagine how LLM would work in your language, e.g.

Yes, and even with all these optimizations everything would work so slow that no one would use such language, practically speaking.

If limitations of floats are too problematic then usually you don't need floats at all but need something else. Symbolic math, or something like that. That:

just wouldn't work. Integers are not a field, floats are not a field and anything you may, realistically, invent without turning your floats into dynamically-sized structs which can occupy unlimited memory wouldn't be fields.

And bignums are much less useful, in practice, than people imagine. Most of the time when you hit limitations of floats which can be overcome by switching to bignums it's not time to bring bignum library, but time to stop and rething what you are doing.

But we do allow propagation of `None`'s. There are plenty of code which converts `Option`s into `Option`s.

It's monads all over again. Maybe the best treatment of floats would have been with trearting floats as `Option<NonNanFloat>`s and going from there.

Then treatments of `NaN`s, suddenly, obvious and practical. And like one couldn't do math on top of NonZero ints one wouldn't be able to use `NonNanFloat` for calculations, too.

That one is solved problem: RISC-V doesn't have an overflow flag.

Obviously not, if you judge from that example.

Processor, of course, does know and care, that's the issue. Hardware designers long dreamed of removing these pesky flags which make hardware optimizations hard. They managed to do that in most popular architecture in certain subsets (both ARM and x86 don't support flags in SIMD instructions and Intel finally plans to suppress them in regular instructions, too). And entirely abolished them in RISC-V.

1 Like

Damn it! We are going backwards! Love the RISC V though.

What I meant was that a processor does not know or care if a given code sequence is the generated by an optimising compiler or some naive crap I have written myself. It just does it's best to get it done ASAP.

How does one detect a carry out/overflow in RISC V by the way?

Edit: Seems it can be done in four instructions: Wanna quick solution to identify overflows? – Use RISC-V branches – VLSI System Design

Hope that works out comparably to having a carry flag. Assuming a RISC V with pipelines, parallel dispatch, branch predictors, speculative execution, etc, etc.

Things like pipelining, speculative execution, and branch prediction are happening at the microcode level, though, and every possible branch point makes those harder/less effective— I don't work down at that level, but there's got to be a pretty hard limit on how many branch points the predictor can keep track of at one time.

Why do you say that? CPUs are, finally, made to behave like high-level languages expect them to behave.

It was always amusing, kinda: high-level languages haven't used overflow and carry (except is some rare optimizations), yet CPUs always diligently calculated them (which made speculations hard).

Now the issue is solved.

Actually almost all CPUs that we are using were designed for “some naive crap extra hand-optimized assembler written by hand”.

And high-level languages were just using them. Attempts to create CPUs optimized specifically for high-level languages were made regularly (starting from B5000, then iAPX432, then Itanic, etc), but they always ended up as total flops because they didn't give the ability to write these all-important superoptimized routines like `memcpy` or `strlen` in assembler.

RISC-V is a compromise: it's, mostly, designed for high-level language, but there are also some facilities which allow one to write efficient `strlen`, if needed.

It's not the branch points per see. Flags just mean that your instruction doesn't touch just one register (that said instruction “officially” touches), but it also affects 3 or 4 extra registers (4 if you threat all generated flags separately, 3 if you merge `AF` and `OF`).

This increases work for register allocator, renamer and so on significantly for very little reason.

Susprisingly enough `SF` and silly x86-specific `PF` don't cause much grief because they are not additional results of instruction, they are just bits in that some result register that you are generating anyway.

1 Like

Flags are just crappy registers that don't work with normal instructions.

I'd much rather that there was a normal wrapping `add` that didn't update anything other than the target registers, and a wide one that updated two target instructions.

MULX — Unsigned Multiply Without Affecting Flags is easy proof that we could just have instructions that put the extra information in real registers, rather than wasting space updating flags all over the place.

2 Likes

This seems like the fact that eggs make birds and birds make eggs. Unlikely one is going to get a cat from that process. Languages use instructions, instruction sets get designed to support languages....if you see what I mean. Meanwhile, if correctness were the top priority all the way up from the bottom to the top, what instruction set would be required to support it? How would the language use it?

Indeed. Back in the day I observed our new micro-processors sprouted instructions to make life easier for assembler programmers. Like the Z80 extension of Intel 8085. Like the Motorola 6909 followup to 6800. They were not thinking of high level languages and compiler writers. Later I found that was true much earlier. IBM machines had hundreds/thousands of microcoded instructions to make life easier for assembler programmers.

Then high level languages caught on and it was noticed that compilers did not use most of that stuff. Enter RISC...

Personally, having read the iAPX432 data books when it was new on the block I felt it would never catch on. Us assembler programmers had no idea how to use it

Bottom line was people wanted performance over whatever safety the 432 offered.

Later I had a similar feeling about the short lived Intel i860. Having the assembler programmer deal with the complexities of its pipelining and parallel dispatch of int and float operations was just too hard to deal with.

Seems the compiler writers had the same problem using those beasts effectively.

Not in the same way that NaN propagates unchecked. The property that makes option types successful is that they enforce explicit handling of exceptional cases. NaN is an exceptional case, and we have no way to handle them explicitly.

It is. The monadic behavior of option types has solved a whole class of problems with nullability. But we are still at the mercy of NaNability with floats. It's a similar problem with the same solution.

That isn't quite the same thing, but point taken. If we assume all floats are wrapped, then it becomes harder to perform arithmetic with them until they are explicitly unwrapped.

I would pose the solution differently: `NonNanFloat` should act more like the normal integers with respect to overflow behavior. The integers are not wrapped. Arithmetic operations on them are allowed to overflow in optimized builds. Your integer arithmetic code contains overflow checks everywhere with `debug_assert`. I could envision `f32` and `f64` having similar debug assertions inserted everywhere to detect the presence of NaN.

All that said, I think this ship has long-since sailed. This is likely an intractable problem in Rust [1]. But it's a good cautionary tale for any languages inspired by it: Do not inherit C-flavored floats and all of their quirks (but allow an opt-in escape hatch if you need compatibility).

1. I hope this is incorrect! If the probability of acceptance for an RFC to introduce `debug_assert!(!(\$expr).is_nan())` is high enough, I would write it myself. ↩︎

No, I don't, sorry.

Except we did. Most CPUs are calculating flags which are very much not used, most of the time, by compilers. And are just needlessly complicating hardware.

It's hard to say because devil lies in details. I would assume that if correctness was the priority from the beginning then things like dependent typing would have been mainstream, not purely academic study with some fringe experiments on the side.

And in that world flags don't really have much use, too.

Sure, because that:

is not a CPU optimized for the compiler, but hardware simplified on the naïve assumption that compiler would save it. It haven't always worked.

Yes, RISC made life of the compiler's writer simpler because it removed some of the translation layer between actual hardware and machine code. Even x86, these days, executes something like `INC [RAX]` as three separate micro-instructions, RISC just gives both humans and compiler ability to explicitly include these three instructions in your program.

But these simplifications are equally relevant for assembler programmers, too.

I am playing with the idea of 'floating rationals': (m1/m2)*(2^e), where m1:int64, m2:u64, e:int64.
Strict rationals suffer from 'denominator explosion' on trivial operations like addition but there is nothing to stop us using exponent shifts and approximations, same as with floats. Apart from more precision, the main benefit would be representational, e.g. n/0 being a legitimate object (approximate infinity). Thus no special values need to be dedicated to +infinity, -infinity and NaN.

Also, divisions by primes, such as n/3 currently cause pretty horrific truncation errors. These will be much better, though of course truncation (or rounding) errors will still arise in many operations.