Floating-point is somewhat counter-intuitive, but not purely magic. It's just because some of our assumptions in integers doesn't apply to floating math.
I also had the same question with you before: why didn't compilers implement float equal as 'less than epsilon' by default? And I found doing so will (1) Making it more confusing sometimes (2) In some rare cases we still need the 'accurate' compare (3) Many hardware targets have instructions to compare them, compiler can directly translate these comparison into the instructions.
Precision lost in float math may be due to various reasons. For example, 0.1
in decimal is not exactly represent-able in binary float numbers. Think about 1/3
in decimal, this number exists, but you're never be able to write down the accurate value in finite numbers (0.33333....
). This kind of lost is related to binary format.
Another common reason of precision lost is how float numbers are represented underneath. It's sign*mantissa*2^exponent
. This makes it be able to hold large but not 'accurate' number like 111111..0000000000000000
(in binary). And that's why it's called 'floating point'. When a very large number adds to a very small number, the lost happens since we have to shift exponents into the same to make mantissa add-able. This leads to a strange fact: not every i64
number is exactly represent-able in f64
(such as 1241523563262624633), although maximum of f64
is about 180000000...
(300+ zeroes). So you'll understand why some float operations are commutative but not associative.
Even besides precision, there're other 'dark corners' in floating point math: special number for infinities and 'NaN' (it's really, really not a number so you can't use ==
to compare with it, actually NaN != NaN
), sign of zeroes, rounding mode and exceptions. Just like the laws, boring, but sometimes helpful.