In my code I work directly with pages allocated using
mmap wrapped roughly into:
pub struct Page([u8; 4096]);
But I wonder if
#[repr(align(4096))] has any practical advantage over
#[repr(align(64))] (the latter allows aligned load/store with AVX-512). Is Rust compiler able use the stricter alignment information, e.g. by using the lower bits niche? Or is there no practical difference right now between them?
It matters because a page may get copied to stack in some scenarios and forcing 4096-byte alignment could be quite expensive.
To my understanding, even AVX-512 does at most load/store 512 bits in a single operation. These will get a certain performance penalty (but not segfault, as in SSE), if the address is not aligned to a multiple of 64 bytes (512 bits). But how would an alignment to more than 64 ever be advantageous?
I don't think it'd make a difference in terms of compiler optimisations, but alignments of greater than 64 bytes can make a lot of sense when you start talking about things that interact with the OS.
For example, pages of memory given to you by the OS will be aligned to match whatever page size it uses. Similarly, if you used
mmap to map a file into memory as a
&mut [Foo], making sure each
Foo's size is a multiple of a page will probably lead to more efficient IO.
Page aligned data also make better use of the TLB cache. The CPU keeps track of where a certain number of virtual pages are in physical memory and if you exceed that, memory access becomes slower.
Those have been proposed, but are not currently the case: https://github.com/rust-lang/rfcs/pull/3204