It's not useful to over-analyze this kind of thing in debug mode. In release mode, this crazy_stuff gets completely optimized away, and even in general useless memcpy will be reduced. But if you really want, you can use the playground to examine rustc's MIR output:
So _1 is the array local, and it looks like _2 is a temporary for the dereferenced value that that will be written to array. The first memcpy fills _2, and the second moves it to _1.
I think it's pretty common to start with a simple-minded translation of the code, get it correct, then let the optimizer go to town. Rust could do some optimizations up front in MIR before handing it off to the backend, and IIRC they are planning to do so.
If you analyzing functions, then it's better to use godbolt: Compiler Explorer As you can see function is fully optimized.
Not only it connects assembly with code (when it can), but it also does not have unnececcary utility assembly. (because playground compiles binary, not lib)
No memcpy does not mean there is no memory copy happening, in this case it is optimize using xmm registers.
As for the dual copy, not sure how I read this assembly the first time, there seem to be only one copy there. I am still a bit unsure as to why is the stack so big.
fmt::Arguments captures a reference to each value -- I'm guessing that LLVM thinks that a pointer within the array could legally access other parts of the same array in the callee.
You can force it to use temporaries with println!("{}-{}", {array[0]}, {array[2]}), and then the array is optimized away.
I wonder if there's a way to help LLVM understand that it's not legal in Rust for those original references to access the rest of the object?
But you always can go unsafe { std::ptr::read((val as *const u8).offset(100500)) }, so I don't think that it will be possible without LTO. (well, we could mark those references somehow, but I doubt LLVM has such functionality)
Have the unsafe rust guidelines addressed this kind of possibility? It seems like an obvious hazard, if not full UB, since you can't know what's happening in the rest of the object. The other parts could be mutably borrowed elsewhere, for instance.
LLVM does not know about any guidelines, you must somehow prove (or at least declare) to it that the given reference will be used only for reading one byte. I think that potential performance improvements dwarf in comparison to the added complexity.
Sure, this comes in two parts -- decide whether it should be legal, and then express that as much as possible to the backend. We have a similar situation with mutable aliasing, which still isn't declared noalias AFAIK.