T:copy, slice, copy, UB?

  1. We are guaranteed that T: Copy, T is repr(C).

  2. There might be 'holes', i.e. for example:

#[align(8)] 
pub struct T {
  x: u64,
  y: u8
}
  1. Suppose we have two programs.

Program 1 has a: [T; N]
Program 2 has 'b: &mut [T; N]`

Suppose program 1 uses slice::from_raw_parts to construct a &[u8], and sends it over the network.

Suppose program 2 uses slice::from_raw_parts_mut to construct a &mut [u8] from &mut [T; N] and copies in the data.

Is this UB?

Why or why not ?

pre-emptive questions:

alignment: b: &mut [T; N] is properly aligned, so when we construct a &mut [u8] from it via slice::from_raw_parts_mut, we're good

same rep in two programs ?

T is repr(C)

why would you think this is UB ?

In the case above T is align 8, but has a u8, so there is 7 wasted bytes. It seems like either reading or writing to those bytes may end up triggering UB.

If the above is UB, how do we fix this ?

If T has any padding, then the write is UB because writing the data involves reading those uninitialized padding bytes. (Writing to them is ok.)

One way to fix it is to add fields for the padding:

#[align(C)] 
pub struct T {
  x: u64,
  y: u8,
  _padding: [u8; 7],
}

If you use the derive macros in the zerocopy crate, then you get the transmutes to and from byte arrays for free, and it also verifies that your layout has no padding bytes.

Since you are using #[repr(C)] for the layout, there shouldn't be any other issues with doing this.

3 Likes

BTW, what resource have you read that I have not read that gives you the knowledge to figure this out / answer this question ? I feel like I skipped something in my Rust education.

Something like this might be a good starting point to learn more, in case you haven't read it already: "What The Hardware Does" is not What Your Program Does: Uninitialized Memory

1 Like

Like a lot of unsafe topics, I feel there is no central resource as there is no spec, and there are still a ton of open questions. That said, my first stop is often searching around on UCG. I would agree knowing about these issues currently requires way to much searching and reading.

2 Likes

Something else I just recalled, you can use

#[derive(Copy, Clone, Eq, PartialEq, Ord, PartialOrd, Hash)]
#[repr(u8)]
pub enum Pad {
    Zero
}

#[repr(C)] 
pub struct T {
  x: u64,
  y: u8,
  _padding: [Pad; 7],
}

And then your struct will have a niche (though, there's no guarantee it will be utilized).

  1. What's a "niche" in this context ?

  2. Why is enum Pad better than just a u8 ?

A niche is a range of invalid values for a type that is known to the compiler. When a type with a niche is put in an enum, sometimes niches are exploited to store the enum tag, which can reduce the size of the enum. It's guaranteed in very particular situations.

u8 has no niche as all bitpatterns are valid.

Here's an issue about zero-padding and niches.

I wouldn't use an enum like that. It makes it dangerous to transmute data received over the network since, if the padding bytes contain any other value than zero, then you get UB.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.