There is a bug appeared with gcc+x87: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=323
If intermediate result is not stored from 80-bit fpu register to 32-bit variable, the result can change, and it is related from optimization options.
If we use SSE, the problem will not occured.

I'm compiling program in i686-pc-windows-gnu target.
Does compiler use SSE instructions by default?
Can I expect same formula results with same imputs in different context?

That bug "appeared" a long time ago. In the year 2000. And closed as "not a bug" as far as I can tell.

You should not expect precision in your results greater what the IEEE 754 standard specifies. Some intel floating point units have 80 bit precision internally so expect so some little differences.

Also that test is testing for the equality of two floating pint numbers. This is generally a bad idea.

I dont talk about comparing numbers calculated by different formulas. 2.0.sqrt() * 2.0.sqrt() is not equal to 2.0 in floating-point numbers, I know it.
I talk about determinism. Does 2.0.sqrt() * 2.0.sqrt() equal to 2.0.sqrt() * 2.0.sqrt() or not?

Even with SSE enabled, the x86 C calling convention states that floats have to be returned in an x87 register, not an SSE one. And libm functions like sin and sqrt will likely still use x87 internally.

fma(x, y, z) has to return the same thing no matter what: the closest number to x*y + z. ("Closest" depends on your rounding mode which as far as I know you can't even set in Rust.) But x86 has seemingly redundant fma instructions that differ in which NaN propagates to the result (i.e., if all three inputs are NaN, which input should the output copy?).

Three “redundant” instructions are not redundant at all. Like with many RISC implementations you pick the output argument from three input ones.

And the trouble with fma is not that it's defined differently on different CPUs (it haven't), but with the fact that when you replace x * y + z with one fma instructions you may get different result.

It's, in some sense, “better” (read numerous papers about how fma is important for many algorithms), but, for the context of this thread, the important thing is that it's different.

The only way to guaranteed that floating point would be 100% stable is to encode everything manually in assembler, I'm afraid.

And even then you have to remember which instructions produce different results on different CPUs.

That's why I wrote "seemingly." The difference between overwriting x or y in fma(x, y, z) has to do with NaN.

Is it legal for rustc to optimize the expression x*y + z to an fma? This seems bad for code that needs to recover the low bits of the product with fma(x, y, -x*y).

I would be pretty upset if fma(x, y, -x*y) got optimized to 0. But I would be happy with a multiplication operator that couldn't be fused with an addition.