I've been working on a serializer (ser_raw) which does a lot of memory copies, and have been investigating how fast different methods of copying are. One thing has me confused.
I would have expected that all of the following would produce the same assembly:
use std::ptr;
#[repr(C)]
pub struct Foo { x: u32, y: u16, z: u16 }
#[repr(C)]
pub struct Bar { x: u32, y: u16, z: u16 }
pub unsafe fn write_foo(foo: &Foo, out: *mut u8) {
ptr::write(out as *mut Foo, Foo { ..*foo });
}
pub unsafe fn write_foo_as_bar(foo: &Foo, out: *mut u8) {
ptr::write(out as *mut Bar, Bar { x: foo.x, y: foo.y, z: foo.z });
}
pub unsafe fn copy_foo(foo: &Foo, out: *mut u8) {
ptr::copy_nonoverlapping(foo as *const Foo, out as *mut Foo, 1);
}
However, Godbolt says otherwise: Compiler Explorer
(and Rust playground confirms: Rust Playground)
The first 2 functions are compiled to:
mov eax, dword ptr [rdi]
mov dword ptr [rsi], eax
mov eax, dword ptr [rdi + 4]
mov dword ptr [rsi + 4], eax
But copy_foo
is only 2 instructions:
mov rax, qword ptr [rdi]
mov qword ptr [rsi], rax
What has me really confused is that the compiler does combine the two u16
read+writes into a single u32
read+write, but it stops there, rather than combining again, to end up with just a single 8-byte read+write - as in copy_foo
.
I suspect it has something to do with alignment, because if Foo is u16
, u8
, u8
, that does get reduced to a 4-byte read+write. But still, why? The same compiler quite happily produces unaligned read/write instructions for other code.
Hope someone can explain these mysteries!