Should f64::algebraic_add(1.0, 2.0) allow non-determinism?

Algebraic floating point functions were recently added to increase performance by allowing compiler optimizations like re-association, which in turn enables better auto-vectorization of code. The documentation notes the following

Because of the unpredictable nature of compiler optimizations, the same inputs may produce different results even within a single program run.

And miri in turn emulates this unpredictable nature by applying a small random error to such operations. This can be seen in the playground:

#![feature(float_algebraic)]

pub fn main() {
    dbg!(f64::algebraic_add(1.0, 2.0));
}
[src/main.rs:4:5] f64::algebraic_add(1.0, 2.0) = 2.9999999999999982

I'm wondering whether the goal here is really to be unpredictable even for simple operations like addition / substraction, with operands and results that are exactly defined by the floating point standards. In my opinion, these methods should allow optimizations like reordering of operations, fusing multiplies with adds or using reciprocals, but allowing random errors on exact inputs does not seem to be algebraic at all.

3 Likes

I agree with you. It is incorrect to add error to FP arithmetic operations where no algebraic rearrangement of them could have produced that error. This will produce unrealistic failures.

1 Like

This seems like a reasonable thing for Miri to be doing to help you test your code, though? What you're saying by using the operation is that you're willing to allow some error on this operation depending on how it's used, but "how it's used" is a hard thing to test. That's especially true in Miri, which wants to disable optimizations to make sure what you wrote is correct, but that means it usually can't see the kinds of interprocedural uses that are where algebraic_* are most useful.

What's your real scenario? What are you doing where you're using algebraic_add but the result being off by 1ULP is breaking your code?

My argument is that the operation should not allow "some error", it should allow any transformation that would be valid for arithmetic expressions on real numbers. And I agree this would require interprocedural rewrites and would probably be very difficult to implement in miri.

My usecase is a simple aggregation kernel:

pub fn sum(v: &[f64]) -> f64 {
    v.iter().copied().fold(0_f64, f64::algebraic_add)
}

I could try to optimize this myself, using multiple accumulators or using intrinsics, but with the algebraic functions, the compiler can do a better job of that for any specified target-feature or target-cpu.

I of course also want to write a test for this function:

pub fn test_sum() {
    assert_eq!(sum(&[1.0, 2.0, 3.0]), 6.0)
}

With this known input, any valid transformation of the sum function should exactly return 6.0. If I want to run this test with miri, there are now two options:

  1. Change the test to do an approximate comparison. The choice of epsilon has to be documented ,is it f64::EPSILON, 1 ulp of the exact result, more than 1 ulp? And other junior developers looking at this code will continue in their believe that floating point operations are somehow magic and can never be exact.

  2. Change the code to use f64::add when cfg!(miri) is set, thus no longer testing the production code.

Both options are not good. In tests it is possible to provide inputs that have known exact outputs. We should rather teach developers about how floating point works and which compiler transformations are allowed, instead of pretending that floating point operations can introduce random errors.

2 Likes

That doesn't follow from your definition. 1.0 + 2.0 + 3.0 + 1e30 - 1e30 is a transformation that is valid in real numbers, and results in 0.0 in f64.

5 Likes

No. That's the idea: if sequence of additions may be rearranged in a way that with real numbers it would produce the exact same result, but in floating point numbers it would produce something different - then this is valid output from algebraic_add, but if one couldn't rearrange numbers to produce such output then this wouldn't be valid.

In particular 1.0 + 2.0 + 3.0 + 1e30 - 1e30 may give us any of seven answers: 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, or 6.0 – but anything else would be invalid.

And for 1.0 + 2.0 + 3.0 (without these 1e30 and -1e30 that may muddle water) only 6.0 remains.

IOW: Miri is permitted to produce anything that vfredusum.vs may produce… but nothing else.

2 Likes

Miri would need interprocedural analysis to even see that you have such a string of adds at all, since f64::algebraic_add alone is just a function with two parameters. Then consider that the optimizing compiler may make a whole lot of transforms before getting to an actual change where "algebraic" makes a difference, and I think it's hopeless to expect Miri to accurately model that.

1 Like

Miri already has to track additional information, outside the program memory, for the checks it does for pointers. What if each float had additional information, some kind of note about how to “unmake” it? For example,

  • When evaluating let x = algebraic_add(0.2, 0.1), store either _ - 0.2 or _ - 0.1 as the note.
  • When evaluating algebraic_add(x, 0.3), use the note to reconstruct a reordered addition (0.1 + 0.3) + 0.2 or (0.2 + 0.3) + 0.1.

Importantly, if the note is missing (e.g. due to a float-to-int conversion), this is not an error; it just means that fewer possibilities are explored. (Those are similar to the possibilities an optimizer would miss, which is not actually good for Miri’s goals, but is probably pragmatic.)

1 Like

And this was a massive amount of work -- like copying byte-by-byte needs to split up this information and rebuild it from the parts.

It's worth it for pointers because they're so important, but I really don't think it's worth it for algebraic things that can introduce 100% error in computations anyway.

I mean, the algebraic operations on them are magic. They exist to be magic. You always have the non-magic ones available if you want.

1 Like

There's a well defined range in which integers stored as floating point are represented exactly, and arithmetic on them is fully deterministic as long as they don't exceed the range and aren't mixed with fractions.

In JS, this is used very often to get 52-bit integers inside 64-bit floats.

So I think simply adding a random error is incorrect, because for IEEE floats 1.0 + 2.0 == 3.0 is reliably true, and there are no algebraic optimizations that break it.

It would be good if Miri could simulate the optimizations better.

6 Likes

That's not the definition @jhorstmann proposed. He said any transformation that would be valid in real numbers should be allowed. So that doesn't restrict it only to rearranging additions in a different order.

And it's also not what the documentation says: it says the set of valid optimizations is unspecified, so that also doesn't restrict it only to doing additions in a different order.

The fact that the expression before optimization only involves exactly represented integers doesn't imply that the modified expression after optimization only involves exactly represented integers. Surely that would be too restrictive as a general rule since even changing (a+b)+c to a+(b+c) would break it.

The set of allowed algebraic optimizations is not specified, so this doesn't necessarily follow.

3 Likes

You are technically correct, but I don't think that's the reason enough to allow it.

Certain optimizations are permitted, even if, applied to floating point numbers, they can produce different result. That's the whole point of algebraic_add: it promises to return something that may differ from regular add If that's the result of some algebraic transformations.

Documentation lists some optimization that may change the result slightly in some corner cases, but it doesn't say that optimizations may produce radically different results even if inputs are small integers. If that's permitted then it should be spelled explicitly. Readers and writers of the documentation are humans (on both sides) thus some amount of common sense have to be assumed.

I would say that optimization that permit transformation of 1.0 + 2.0 + 3.0 into something other than 6.0 goes way beyond what's permissted in hardware and most compilers.

Yes, but your definition allows implementation that always return 0.0. Calculate the result, pretend lerge enough value added and subtracted… voila: you made it! Usability of such implementation is… questionable. Documentation talks about certain optimizations and tells us that list is not exhaustive, but it doesn't tell us result may always be zero, unconditionally.

I don't know how to formally define what is allowed and not allowed by algebraic_all, but If Rust really permits an implementation that always returns 0.0 then it should be spelled explicitly. And then we may simplify it to “result of algebraic_add returns some random value that may or may not be related to inputs at all… don't use it”.

Because such implementation is no longer useful for anything but language-lawering.

2 Likes

If f32::from(1).algebraic_add(f32::from(2)) ever gives anything else than 3.0 (exactly), in any configuration at any optimization level, then Rust or LLVM is broken and needs to be fixed.

IEEE754 floats are quirky, optimizations and precision are hard to reason about, but it's not some unknowable magic nor an analog fuzzy system.

4 Likes

I agree + must return 3.0, but we're talking about algebraic_add where there is no such guarantee.

1 Like

Same for algebraic add. There's absolutely no reason for the optimizer to destroy precision when there's no beneficial operation requiring it.

1 Like

Are you claiming there is such a guarantee in the current nightly documentation? I can't find it. This suggests otherwise:

Unsafe code must not rely on any property of the return value for soundness.

It's hard to say without seeing all the rest of the code whether or not there is a performance benefit from losing precision. For instance:

let a = b.algebraic_add(c);
let d = a.algebraic_add(e);

I can imagine a possible performance optimization that causes a to be dropped from a register and later recomputed as d - e, which may well cause a value different from 3.0 when b=1.0 and c=2.0.

But regardless of whether such an optimization exists, that's a different issue from what is guaranteed by the language. An optimization might not exist in one version of the compiler, and might be introduced in a later version of the compiler or in a different compiler if the spec allows it.

3 Likes

The issue is that the algebraic_* operations cannot be analyzed in this way. For example, if we have

let x = A.algebraic_add(B);
let y = C.algebraic_add(D);
dbg!(x, y);
let z = x.algebraic_add(y);
dbg!(z);

it may be the case that z != x + y, if the algebraic transformation of (A + B) + (C + D) shakes out that way. It probably won't, since the intermediate results are required to be manifested in this example, but it's allowed.

I expect that for a single (non-trig) algebraic operation, the output will be the same as the non-algebraic operation. But as soon as you have a second algebraic operation, the bets are off; you can end up with whatever output is most convenient for the compiler, with the expectation that the result is algebraically related to the code's computation.

This is also why trying to do something like bound error to a specific number of ULP is pointless; the individual operations fundamentally cannot be analyzed independently, and that's the entire point of the algebraic_* functions.

hint::black_box is similarly a function which provides no guarantees about fitness to purpose; it's perfectly allowed for black_box to do nothing at all other than be an identity function. But that doesn't mean that it's useless; as a Quality Of Implementation thing we try very hard to ensure that black_box behaves reasonably.

This is, imo, clearly false. It should be allowed to logically transform iter::repeat(a).take(b).fold(0.0, algebraic_add) to a.algebraic_mul(b), but merely modifying the reduction order is not sufficient to justify this transformation.

FWIW, I suspect the feeling of floating point being non-deterministic comes in part from over usage of -ffast-math style optimization flags. Under the fast-math flags, floating point is effectively non-deterministic, as the exact same computation can produce different results depending on ambient context.

2 Likes

Yet we need some bounds. If the guarantees that algebraic_add can not be described at all then these instructions are pointless, it's better to simply remove them.

We should apply that to algebraic_add, then. No one (except maybe most die hard language laweyering fans) would accept algebraic_add that would return anything else from 1.0 + 2.0 + 3.0, except pure and guaranteed 6.0. That's way outside of “reasonable behavior”: we have inputs that are very far from limits, we don't have any impresision (like with 0.1 and 0.2), there are absolutely no reason not to expect precise result.

Then we need some other, yet still bounded, definition of what optimizations can and can not do.

If we couldn't bound them in general case we may, at least, give some guarantees. E.g. I would expect that computations that would produce the exact same result when done in real numbers and when done with integers of half-size of float (i32 for f64, i16 for f32, i8 for f16) would stay undisturbed by any sane optimizations.

Everyone knows that when floating numbers are “pushed to the limit” and all dicussions about ULP starts being meaningful optimizations may lead to “strange” results, yet no one but language lawyers expect that when you doing calculations that are not even remotely close to the boundaries… results would still be unpredictable and random.

FWIW I agree that algebraic_add does not seem useful as is. If I'm writing some numeric code, I want to be able to bound the error. I'd much rather implement my own specialized vector summing code if necessary than risk unbounded errors.

f32::sin also has this problem as it doesn't specify the possible error, but at least I can guess an undocumented guarantee that it's never more than 5 ulps or so because math libraries typically guarantee that. I think there should be some precision guarantee on f32::sin documented by Rust.

hint::black_box also has no guarantees, but I'm not going to use it for anything other than benchmarks, and benchmarks are best-effort with no guarantees anyway.

Giving such a guarantee would defeat the whole purpose of these operations. It would mean that a sum can't even get accumulated in parallel as separate partial sums to be added at the end, which as I understand is one of the motivating use cases.

Answering the original question: it's impossible to guarantee both:
f64::algebraic_add(1.0, 2.0) == 3.0 and:
f64::algebraic_add(3.0, -2.0) == 1.0
without defeating the very optimizations that these new operations are supposed to enable.

Suppose you write this function:

fn sum(n: u32) -> f32 {
    let mut a: f32 = 1.0;
    for _ in 0..n {
        a = a.algebraic_add(2.0);
        a = a.algebraic_add(-2.0);
    }
    return a;
}

If the guarantee is to be upheld, this should behave the same as this:

fn sum1(n: u32) -> f32 {
    let mut a: f32 = 1.0;
    for _ in 0..n {
        a += 2.0;
        a += -2.0;
    }
    return a;
}

But the point of algebraic_add is to enable optimizations such as computing the sum in parallel:

fn sum2(n: u32) -> f32 {
    let mut a1: f32 = 1.0;
    let mut a2: f32 = 0.0;
    for _ in 0..n {
        a1 += 2.0;
        a2 += -2.0;
    }
    return a1 + a2;
}

But sum1 and sum2 return different values!
sum1(1000000000) == 1.0, sum2(1000000000) == 0.0.

Which shows algebraic_add must be nondeterministic even for tiny integers if it's to achieve its optimization purpose.

5 Likes