Floating point number tricks

A lot of negative statements about floating point here. Maybe the folks doing the 7-day weather predictions should rethink things.

Floating point arithmetic is indeed a mess, but there's plenty of evidence that you can do useful work with it. Even people who don't understand the dangers can go quite far using libraries written by people who do.

That exact quote applies to memory safety as well:

The C specification is indeed a mess, but there's plenty of evidence that you can do useful work with it. Even people who don't understand the dangers can go quite far using libraries written by people who do.

That's not so much a 'problem', it's just the way float works. You're making it sound like any float operation is liable to explode with unbounded error. The error is bound by how far away from 1.0 your operands are. Yes, you have to design algorithms by making sure your important calculations are unit-scaled, but that's not like a major challenge or anything, you just have to scale your inputs.

That's the problem I'm having with this discussion: you keep saying "in general" and "independent of the values" as though that's the problem you have to solve. If you're saying you'll take a 2x performance hit to avoid scaling your inputs one time, well why would you do that? Isn't it easier to pay the one time cost of learning how to use floats correctly than it is to suffer a performance penalty in perpetuity?

Nobody is saying floats will solve all your problems, or that they don't have trade-offs. Only that they are very efficient in terms of what you get for the tradeoffs you make. That's why they are so compelling at the hardware level.

Unexpected rounding errors happen in fixed point much more readily. Unexpected performance degradation happens in rationals much more readily. Unexpected overflow errors happen in integers much more readily. There really is no way for a novice to expect everything. Every number system will act unexpectedly, and the raison d'être of floats is that you can coast a long way without it behaving unexpectedly, as long as a reasonable amount of error is tolerable.

The C specification is indeed a mess, but there's plenty of evidence that you can do useful work with it.

There are many more alternatives to C than there are to floats. And in some domains, there aren't, and you see people using C because they have to. What you're arguing is merely rhetorical, while alankharp is giving practical advice. You can argue C is awful rhetorically, but people can still use it and be correct in doing so.

2 Likes

Yes, it's the way float works. If you want to say "these values have to be bounded near 1.0" then okay, that's one problem solved(ish) in return for another one: making sure your values are always bounded near 1.0, which may or may not be easy to do depending on what you are doing. Most operations make the values a bit larger than they were to start with, so whatever your bound you will sooner or later exceed that bound. Maybe in your application that's okay because you are doing at most N operations, or maybe it's okay because you renormalize periodically, or maybe it's okay because you can't get outside of the game arena (or at least, you hope you can't).

It's a nontrivial verification condition but you can justify it in a lot of real world cases. Of course most people don't bother to go through the effort of doing this or lack the knowledge to do so. That's where floats are most insidious - unless you know what every computer scientist should and take the proper precautions, you will get numbers out that have nothing to do with the value you wanted to calculate. The analogy with memory safety seems apt: you will get crazy results unless you follow some rules that very few people understand well, and nothing will tell you that you've broken the rules in many cases.

I don't intend to make it sound hopeless, but it's definitely a tool with a lot of sharp edges, and I would think twice before introducing approximations into any application. Rust has your back when it comes to pointer handling, but it doesn't prevent you from shooting yourself in the foot with bad FP handling, so you just have to either learn the arcana or stay away (or be comfortable with your results possibly being nonsense).

It's possible to use C correctly, it's possible to use floats correctly. They both have complicated domains of correctness, are easy to pick up and use without worrying about those conditions, and work okay even if you don't care, unless you are unlucky and cause some critical safety bug or cause a missile to go off course due to rounding error.

I really do think that the comparison is appropriate.

1 Like

Isn't that true about everything, though? You also need to know to avoid Schlemiel the Painter algorithms if you want to handle large input sizes. You need to implement Ord properly if you want sorting to work. You need to implement Hash + PartialEq correctly if you want the HashMap lookups to do the right thing.

Ironically, the Ariane 5's infamous first test flight failure was because of truncation to a 16-bit integer. And one engineering issue with the Therac-25 was setting a flag by incrementing, resulting in occasional overflows that bypassed safety checks.

Safety-critical software is hard. I know no reason to assume that avoiding floats would make it materially easier.

(And, really, I figure these would generally try to rely on some sort of feedback anyway, since there's no way the physical system is directly controllable to 52 bits of precision. Even the earth's gravitational field varies by a few permille depending where you are on the surface. So ideally whatever adjusts the flight path to account for things like wind is more than enough to account for any floating-point roundoff error. I certainly don't know how to build an IMU accurate enough to dead-recon the position of a missile, but that's not because of any worries about floating-point.)

It is, although what "the proper precautions" actually are is a thorny business. One area where floats are unusually bad compared to other areas of program correctness is that the input/output domain of correctness in FP algorithms is almost never spelled out, despite the critical importance of this to (correct) modular design. For most operations you can look at the types and the asserts and more or less figure out the preconditions and postconditions of a function, at least at a basic level, but with floats who knows how big things are supposed to get or what the tolerance of individual functions is.

Avoiding floats trades one problem for another. The cards are stacked against you in both cases - it's not easy to write correct code with current tools either way, but floats are simply more complicated in every sense: they have more failure modes, they require more complicated circuitry, the representation has more special cases, there is less architectural agreement about how they work, and there is more compiler variation in how they get optimized. That all translates to more cognitive complexity when writing code that takes all that into account, and a higher chance for bugs.

The missile example I was thinking of was the 1991 Patriot missile failure by the way, which was an actual rounding error, although it used 24 bit fixed point instead of IEEE floats.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.