I was really confused yesterday when trying out some things on Compiler Explorer (godbolt.org) and the generated assembly suddenly seemed super unstable. At first even suspecting some kind of weird bug. For some code like
I was interested in the kind of modern CPU instructions this would generate, and found it became
example::log:
bsr eax, edi
ret
when fully optimized via -C opt-level=3 -C target-cpu=native. Or actually… maybe it became
example::log:
lzcnt eax, edi
xor eax, 31
ret
instead!? The output would vary whenever I changed the whitespace in the code, such as by adding newlines at the end. Of course feel free to try out the experience for yourself.
It must have taken me multiple minutes, and including testing on my local machine where no weird behavior was present, to come up with the obvious insight:
Matt Godbolt’s Compiler Explorer uses a whole pool of machines – not all of the same type – to process the requests and -C target-cpu=native produces different results depending on the machine being used! Also it uses caching, so the correlation between choice of input whitespace and resulting output assembly seemed weirdly random-but-deterministic.
I’m just posting this experience so maybe someone else who thought of -C target-cpu=native into the compiler flags on Compiler Explorer can avoid the same confusing experience.
So yeah… either don’t use-C target-cpu=native on Compiler Explorer, or brace yourself for likely running into seemingly inconsistent, randomly changing assembly outputs.
I usually use znver<N>. Much easier to remember than some canadian lakes and also easier to type than x86_64_v3 for which I always get the underscores vs. dashes wrong.
znver3 for a fairly recent CPU with AVX2.
znver4 for AVX512, which also tends to make autovectorized code much more readable.