In C and Rust we can rely on the compiler to allocate memory in a layout that "works" for the type and platform at hand. Works means the starting point, i.e., the memory to which the pointer refers, starts at the same location the CPU architecture expects without having to perform shift operations. Works also means, that in context of a collection, the pointer's stride will land similarly when advanced. Per an example I took note of a while back:
Note: This example is not intending to represent how Rust (or other) might layout the memory per se, but rather just a sketch to "level-set" my understanding of the benefit of, and requirements for, aligned data.
struct {
a: char;
b: i32;
c: i16;
}
the compiler will layout an instance on a 32-bit machine in a way that aligns with the platform's memory layout (it's stride):
-----
| a | 0x0000
| |
| |
| |
-----
| b | 0x0004
| b |
| b |
| b |
-----
| c | 0x0008
| c |
-----
But say we transferred the structure over a network that compressed the data:
-----
| a | 0x0000
-----
| b | 0x0001 🚫
| b |
| b |
| b |
-----
| c | 0x0005 🚫
| c |
-----
This memory is no longer aligned. In order to read b there are now several steps:
- go to 0x0000 (the memory tables don't include 0x0001)
- read 0x0000 into an output register (=> abbb)
- shift left 1 byte (abbb => bbb0)
- read 0x0004 into a tmp register (=> bccx)
- shift right 3 bytes (bccx => 000b)
- combine with bitwise or: bbb0 | 000b => bbbb
When I have allocated memory for a concrete instance of a slice &[T]
, the compiler will layout the memory so it "works" (at the expense of wasting bits to ensure the stride lands aligned with what needs to be read without needing to perform shifts).
The question: What is it about using vectors (in SIMD code) that might somehow
- "corrupt" the sequential layout
- put into doubt that the starting position of the pointer is not aligned with what the platform expects (to read without shifts)
Scenarios
Vertical computations (computations that merge lanes from different vectors)?
aligned, aligned -> aligned
Horizontal computations
aligned first element, aligned second element... -> aligned result in the first element position
If I take bytes from a buffer fed by a file or stream
ptr = &[0..] aligned -> ptr = &[n..] aligned
Finally, if I ask Rust to allocate for Vec<MyU8>
, on a 64-bit machine (memory layout), will it use 64-bits for each of the MyU8
(8-bits)? To minimize "the waste", in this case I could create another type with 8 x MyU8 as the element/item in my Vec; is that required? But nothing that I can imagine so far throws off or changes the value of ptr = &[0..]
.
Thank you in advance to anyone that can help clarify the alignment "gotcha" when working with vectorized code.
- E