If you think "memory is slow so avoiding it makes thing faster" then the one that reads from a constant then writes must be slower than the one that writes an immediate, no?
However, you're right that this can hurt ABI efficiency for small struct returns. But in the example I provided, for 1 million 'records' you would use 24MB memory in Rust while 40MB in C++
Rust chose memory efficiency as the default because:
It benefits arrays and cache behavior (the common case)
When you need ABI-optimal layout, #[repr(C)] gives you explicit control (it's not really that uncommon)
EDIT: The point I am trying to make here is that the same case can be conversely made for C++ too. I am not familiar with zig, so it may optimize for both small and large structs by default? I am sure someone experienced will be able to provide a more nuanced answer here.
So I have to order it manually. The effect is, LLVM is able to do auto vectorization (auto SIMD) in the manually ordered version. But can not auto vectorization in the default version
I've only skimmed it, but this article from a year and a half ago looks like it goes into detail on the issue you're seeing. It seems that for calling conventions rustc just punts to LLVM, which has some surprisingly not-so-great defaults when compared to C.
It does not. Rustc does compute the calling convention on its own except for which exact register to use and the stack layout. For the Rust ABI we do cast values that fit in a single register to be passed in one:
but we don't do this for values that need two registers anymore since