Is it UB to byte-copy memory containing repr(C) structs with padding?

Hi,

I have a large contiguous block of memory (in reality mmap’d shared memory; here simplified as a u8 array). Into this memory I write several different #[repr(C)] structs sequentially using std::ptr::write::<T>(). Some of these structs may contain implicit padding.

Later, I want to copy the used portion of the memory as raw bytes (just a byte-for-byte copy).

My concern is the following:

  • #[repr(C)] structs may contain implicit padding.
  • Padding bytes are allowed to be uninitialized.
  • std::ptr::write::<T>() will cause the u8 array to contain uninitialized bytes as well, despite being initialized before.
  • Copying the memory as u8 would read those padding bytes.
  • Reading uninitialized memory is UB.

Is this reasoning correct? If so, what is the correct way to handle this pattern?

Example

use std::mem::{align_of, size_of};
use std::ptr;

#[repr(C)]
struct A {
    x: u8,
    y: u32,
}

#[repr(C)]
struct B {
    a: u32,
    b: u64,
}

fn align_up(off: usize, align: usize) -> usize {
    (off + align - 1) & !(align - 1)
}

fn main() {
    let mut region = [0u8; 1024];
    let mut snapshot = [0u8; 1024];
    let mut cursor = 0usize;

    unsafe {
        cursor = align_up(cursor, align_of::<A>());
        ptr::write(region.as_mut_ptr().add(cursor) as *mut A, A { x: 1, y: 2 });
        cursor += size_of::<A>();

        cursor = align_up(cursor, align_of::<B>());
        ptr::write(region.as_mut_ptr().add(cursor) as *mut B, B { a: 3, b: 4 });
        cursor += size_of::<B>();

        // Later: byte-copy the populated prefix
        ptr::copy_nonoverlapping(region.as_ptr(), snapshot.as_mut_ptr(), cursor);
    }
}
1 Like

No, copying uninit memory is not necessarily UB.

However, in your case you are copying into an [u8; 1024], and that type does not permit uninitialized memory, so the example code does trigger UB.

If you used a type such as [MaybeUninit<u8>; 1024] or MaybeUninit<[u8; 1024]> where uninit bytes are permitted, your code would be acceptable.

15 Likes

Thank you! That is clever! So, as long as both region and snapshot are [MaybeUninit<u8>; 1024], I can copy them freely and as long as I make sure that I ever only read initialised bytes from these arrays, I should be ok.

If you control structs definitions, you can alternatively get rid of uninitialised bytes by manually declaring padding fields. For example:


#[repr(C)]
struct A {
    x: u8,
    __pad1: [u8; 3]
    y: u32,
}

#[repr(C)]
struct B {
    a: u32,
    __pad1: [u8; 4] // or [u32; 1]
    b: u64,
}

A better way to do this is, just sort the field from biggest to smallest

#[repr(C)]
struct A {
    y: u32,
    x: u8,
    // compiler will add padding here automatically
}

#[repr(C)]
struct B {
    b: u64,
    a: u32,
    // same here
}

The compiler will insert the neccerery padding at the last order automatically to get high performance layout. It is better because the size is smaller while maintaining good layout for cpu access. Smaller size can also become better cpu cache utilization. Considering this manual padding and biggest to smallest ordering code

use std::mem::size_of;

#[repr(C)]
struct Data1 {
    a: u8,           // 1 byte
    _pad0: [u8; 7],  // 7 byte (manual padding to make it 8 byte)
    b: u64,          // 8 byte
    c: u8,           // 1 byte
    _pad1: [u8; 3],  // 3 byte (manual padding to make it 4 byte)
    d: u32,          // 4 byte
    e: u8,           // 1 byte
    _pad2: [u8; 7],  // 7 byte (manual padding to make it divisible by 8 which is 32 byte)
}

#[repr(C)]
struct Data2 {
    b: u64,          // 8 byte
    d: u32,          // 4 byte
    a: u8,           // 1 byte
    c: u8,           // 1 byte
    e: u8,           // 1 byte
    // all fields are already in good start memory according to their type
    // compiler will add 1 padding automatically in the end to make it divisible by 8, which is 16 byte
}

fn main() {
    println!("Data1: {} bytes", size_of::<Data1>()); // Output: 32
    println!("Data2: {} bytes", size_of::<Data2>()); // Output: 16
}

So the manual padding one has size 32 byte
The biggest to smallest ordering has size 16 byte. Smaller memory usage

Next, mostly cpu will not fetch data 1 byte per 1 byte, but will fetch the entire cache line. Mostly they have cache line size 64 byte. So in struct Data1, 1 cache line fetch will contain 2 objects (because 1 object is 32 byte). Where in struct Data2, 1 fetch will contain 4 objects

Another reasons :

  1. Less human error
  2. No need to recalculate when adding new field, just placing to the right order
  3. Cleaner (no code pollution, for example if there are 8 fields, it means there will up to 16 fields there because there are up to 8 paddings declaration)

The use case of manual padding is mostly :

  1. Mapping memory to hardware or protocol
  2. Avoiding false sharing in multithreading
1 Like

You still have to pad the end manually if you want to memcpy the value to a byte buffer. Or alternatively be sure to copy exactly the sum of the sizes of the fields rather than simply size_of::<T>() bytes.

2 Likes

Thankss I just understood the uninitialized thing :>

So I can't memcpy uninitialized memory to initialized memory. While the good info part is the manual struct layouting tip in general (not in uninit memcpy thing)

The first comment says it can if the buffer type is maybeuninit, aka memcpy uninit to uninit is safe. I did try it in rust playground with miri activated, it is true no ub is reported by miri

1 Like

The zerocopy crate formalizes when it's safe to convert a type to bytes. It has a trait called IntoBytes that requires no padding (see here). It won't let you attach #[derive(IntoBytes)] unless it's safe.

5 Likes

Do you think this approach is better than using Deku or BinRW for serializing into bytes?

I do not have experience with either deku or binrw, but zerocopy is not always appropriate for serialization. In particular, you can't use it to convert data to bytes on one computer and then from those same bytes on a different computer, especially if one of them has an exotic architecture. (usize might be fewer bytes; the byte ordering might be reversed; etc.) It also has alignment requirements which you can satisfy with copying (use the crate to convert &mut T to &mut [u8] and write the bytes of a T into that), but zero-copy deserialization requires whatever is framing your data to respect its alignment requirements. If you can guarantee alignment and consistency, it's great. (E.g., you're moving data between x86-64 and a GPU, you are only ever going to use x86-64 in your application, etc.)

1 Like