 # Efficient f32 -> ascii -> f32 conversion?

See examples in playground:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=d4e7a6461cf2e18311d37cdd85b808da

If this line of logic was true, couldn't one conclude that all source code and all articles are lies because they have a shorter representation via `bzip -9`.

Try taking one of those giant decimal examples, and change some of the last few digits. Or add some more digits, anything you want. Parsing that back to a floating point number will give the exact same value as before your changes. So yes, it's a lie to claim that level of floating point precision.

That's a weird conclusion, but `bzip` is a lossless compression algorithm. Parsing a decimal value into floating point is very often lossy -- even `0.1` isn't perfectly represented.

2 Likes

Perhaps. It's beside the point though. We are not talking about compressing human readable source code into some smaller binary gunk that is not human readable.

We are talking about making a human readable representation out of a binary bit pattern that that represents floating point numbers in a running program. For example the 32 bits of an f32 into the 12 ASCII characters of scientific notation. The opposite of compression !

When you do that using the standard formatting to print an f32 you get more digits than you actually need to reproduce the original 32 bit binary pattern in memory when you parse it back again. Those extra and redundant digits are the lie here. They do not exist in the original f32.

As cuviper points out above, if you remove all those extra digits from the decimal string you will get the same f32.

I think this is the heart of the disagreement.

Let x = some f32
x_short = shortest str s.t. the f32 closest to x_short is x
x_full = full expansion of x

x_short = "truth"
x_full = lying with all the extra digits

In my my mind,
x_full = by definition, truth
x_short = some approximation of x, with the property that out of all f32 representable values, x is closest to x_short

To further argue my point, consider having a big decimal class, and doing the following:

big_decimal::from_f32(x),
big_decimal::from_str(x_full),
big_decimal::from_str(x_short),

we are going to always have:

big_decimal::from_f32(x) = big_decimal::from_str(x_full)

we will also often have:

big_decimal::from_f32(x) != big_decimal::from_str(x_short)

====

x_short is absolutely an approximation. It just so happens to have the property that out of all values that f32 can represent, x is the one closest to x_short

This is not correct. That definition you are claiming does not exist.

Consider the case of: x = 1f64 / 3f64;

The standard formatter will produce your "x_full" of:

``````0.333333333333333314829616256247390992939472198486328125
``````

The minimal representation (x_short) is:

``````0.3333333333333333
``````

Two points to notice:

1. Neither of those is true. As you know 1/3 is an infinite string of 3's after the point. In that sense they are both lying.

2. Both of those will result in the exact same binary bits in an f64 when parsed back in from the strings. In that sense they are both equivalent. There is no advantage in keeping all those noise bits when converting to ASCII decimal and back again.

3. All those extra "noise" bits of x_full don't even try to be correct. They may as well be random numbers. No "truth" there.

My use of the word "lying" is a bit emotive but there is a reason for it. In the world of physics or engineering or anywhere you measure something if you write down 10 decimal places you are claiming that the measurement you made is accurate to 1 part in 10000000000. If your measurement equipment is only accurate to 1% then you are claiming accuracy you do not have. You are lying!

In this case the standard formatter is outputting a whole load of digits as if what it is printing is more accurate than the minimal length string. It is not. It is lying by claiming accuracy it does not have.

No it is not. It holds exactly the correct information required to recover the float value that it came from. No more, no less.

Of course that original float is likely an approximation to some idealized value you have in mind, like 1/3, but that is in the nature of floats.

I think you completely ignored my argument and instead are arguing the fact that "1/3 can not be represented as a f32 or a f64."

I agree with you that 1/3 can not be represented as a f32 or a f64.

Even, in the above example, let

x = 1f64 / 3f64

We now have the situation:

big_decimal::from_f64(x) = big_decimal::from_str(x_full)
big_decimal::from_f64(x) != big_decimal::from_str(x_short)
big_decimal::from_f64(x) = big_decimal::from_f64(f64::new(x_short))
big_decimal::from_str(x_short) = big_decimal::from_str(f64::new(x_short))

Again, x_full is the precise mathematical representation of x.
x_short is a mathematical approximation of x with the property that the f64 closest to x_short is x.

EDIT:

I'm agreeing with you that x!= the rational number 1/3.

What I am arguing is that for whatever value that x:f64 is representing, x_full is the precise value and x_short is an approximation.

Maybe this will make it clearer.

R = the reals
F64 = subset of R that a f64 can represent
x: f64
x_full, x_short = strings

Consider the functions:
to_real: f64 -> R
to_real: string -> R
parse_f64: string -> f64

I think we both agree that:

1. parse_f64(x_short) = x;
2. parse_f64(x_full) = x;
3. to_real(x) = to_real(x_full);
4. to_real(x) != to_real(x_short); // EDIT: often inequal, sometimes equal

It is because of (4) that I argue that x_short is an approximation to x, while x_full is the precise representation of x.

Firstly I'd like to say that I what I've been suggesting here is an answer to your original question in your opening post, which was about Efficient f32 -> ascii -> f32 conversion. The solution I suggest is efficient in processing time (See: benchmarks here: https://github.com/dtolnay/ryu) and optimally efficient in space for representing floats as human readable decimal numbers.

Likely one can do better in both speed and space if one drops the requirement for human readable ASCII.

I know what you are getting at but it has no bearing on the question at hand. And is arguably wrong.

The problem here is that you are talking about the Reals (R). Well, there are no reals in our computers. (The number of reals we can represent in our machines is so small that we might as well assume there are none). Most of the possible values of Real never make it inside our computers, there is no way to represent them!

So, your conclusion that: to_real(x) != to_real(x_short); may or may not be true but it has no meaning as we can never see those mythical reals you are converting to. The only way we can observe them with our software is to convert them back to F64 or x_short. At which point we find that x = x_short. For practical purposes then to_real(x) == to_real(x_short);

You have mentioned "big_decimal". That gets us more accuracy but no matter how many bits you can find to represent your floats the above is still true. You have just moved the goal posts.

Thinking about it this is is not true either.

According to your definitions x is our original F64. x_full is the string representation of it we get from the standard formatter.

Looking that 1/3 again we see that "0.3333333333333333" is all we need to represent our original F64 precisely.

Where as the x_full version: 0.333333333333333314829616256247390992939472198486328125 is a representation of something else that never existed in our system.

We can of course extract the precise representation from that x_full by ignoring the digits "148...". But that is problematic because the representation itself does not give any clue as to which digits matter and which are just noise.

1. Point 5, from unedited question, states that human readability is NOT a requirement.

2. You made the following claims:

I disagree on both points. The full representation is accurate. The ryu representation is an approximation -- with the property that constrained to the f32 or f64 domain, you can recover the full original data by taking the closest point.

To see that it's an approximation, just note that over the reals (or rationals), it often represents a different point.

1. Even though I asked original question, I will not responding further.

Yes, exactly. I think we are in agreement.

Except:

It is no more accurate an approximation to the original F32/64 that we started with than the ryu representation. You can prove that to your self by removing all those "noise" bits and parsing it back into an F32/64. You will see that those noise bits change nothing.

Yes, quite likely. That much is pretty obvious.

That does not matter, we are not dealing with Reals, we cannot. We are dealing with F32/64 or perhaps even big decimals and decimal string representations of them.

Fair enough. It's been an interesting discussion none the less.

I'd add that, if the question were asked as "make an efficient, short but human-readable ASCII representation of f32/f64", I'd probably go with the hexadecimal exponential notation like `1.2C40E6p1D`. This is an equivalent of the standard exponential notation using 16-based positional notation, which is exact for the standard floating point and much more readable than the straight bytes-to-hex converted version (but a little bit longer).

4 Likes

Interesting. Sounds like an excellent idea.

Unfortunately Rust does not support hexadecimal hexadecimal exponential notation in literals. Apparently because it does not sit well with the Rust syntax parser. As discussed here: https://internals.rust-lang.org/t/pre-rfc-hex-float-literals/5883. Although there is a crate that does it.

Perhaps that will get added to Rust at some point. We now have a group dedicated to "parity with C" and hexadecimal exponential notation is in C99.

I think it would be better to state that floating point values are not numbers. They represent intervals, any reals that fall within that interval belong to the same floating point value and is thus all are equally valid. Some printing functions are just not very picky about choosing a good one.

2 Likes

While interesting, all but the 2nd, 4th, and 17th posts in this discussion avoid responding completely to the OP's 4th and 5th requirements, instead digressing into long arguments about different human readable representations.

Not quite: Somewhere I said:

Should have said 5 And:

I think they were taken as given and did not need much comment.

Sorry about the tedious digression. I think it's good to try and get these technical details correct. Rust is about correctness right?

Speaking of which, I have a growing feeling that it's not just a case of "Some printing functions are just not very picky about choosing a good one." as the8472 said above. I come to think Rust is actually wrong to do that. By printing all those digits it is claiming to have information it does not. In many circles this is not acceptable. Why should it be in our programming languages?

I know, the other guys do it too and that's the way it's always been done. Is that really an excuse for being wrong?

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.