Impossible to blackbox copy T -> unaligned [u8]?

Let us define the set of types we care about as:

/*

{i, u, f} x {8, 16, 32, 64} in Field

let X be a struct/enum/tuple
  if all the subparts of X in Field, then X also in Field

In particular, Field bans the following:
  references, things with Drop handlers, ...

 */

Even under this restriction, is it impossible in Rust to do a blackbox copy of T to an unaligned [u8] ?

The logic goes something like this:

  1. Because this is blackbox, we do not have access to the internals of T.
  2. Therefore, we either copy as *const T or *const u8
  3. Copying as *const T gets UB due to unalignment.
  4. Copying as *const u8 gets UB when there is padding.

Therefore, it seems impossible to do blackbox copy of T -> unaligned [u8]. (This conclusion is surprising to me, as I think we can do this safely in C/C++).

Is this correct?

There is UB only when you read the padding (and even then, it's not obvious when exactly there's UB and when not, but better be safe). If you copy the type but never read the padding bytes, you're fine.

1 Like

Can you elaborate on why this is UB?

I know that references have an invariant that they are always properly aligned, but as long as you aren't taking a &T reference to your unaligned [u8] before it gets copied back into a T variable it should be fine.

I was surprised that C++ doesn't have this same restriction, but after some reading it sounds like they assume padding bytes are initialized (but with an unspecified value) after the object has been fully initialized.

Like @chrefr mentioned, if you are just storing the bytes so you can reconstitute a (properly aligned) T at a later and won't be reading the memory otherwise then I would guess it's okay.

1 Like

Unless you use write_unaligned(), copying into pointers should be aligned too.

1 Like

Shouldn't something like this be fine because when we are copying it's done via a *const u8?

To give us something concrete to point at, I've tried to put together an example of @zeroexcuses's problem.

use std::mem::MaybeUninit;

#[derive(Debug, Copy, Clone)]
#[repr(C)]
struct Padded {
    byte: u8,
    integer: u32,
}

const SIZE: usize = std::mem::size_of::<Padded>();

fn main() {
    let mut buffer = [0; SIZE + 1];
    let original_value = Padded {
        byte: 1,
        integer: 2,
    };
    println!("{:?}", original_value);

    // copy the value into our buffer
    unsafe {
        let dest = buffer.as_mut_ptr().add(1);
        let src = &original_value as *const Padded as *const u8;
        std::ptr::copy(src, dest, SIZE);
    }

    // this would be UB and is detected by Miri
    // println!("{:?}", &buffer[1..4]);

    // turn it back into a T
    let reconstituted = unsafe {
        let mut dest = MaybeUninit::<Padded>::uninit();
        let src = buffer.as_ptr().add(1);
        std::ptr::copy(src, dest.as_mut_ptr() as *mut u8, SIZE);
        dest.assume_init()
    };

    println!("{:?}", reconstituted);
}

(playground)

For what it's worth, Miri seems to agree with my previous statement about it being okay to store padding bytes as long as you never access that part of the buffer. Uncommenting println!("{:?}", &buffer[1..4]) and running it with Miri triggers a "using uninitialized data, but this operation requires initialized memory" error.

1 Like

This is the code I had in mind:

    pub fn push_2<T: Field>(&mut self, t: &T) {
        let mut data = vec![0_u8; std::mem::size_of::<T>()];
        unsafe {
            *(data.as_mut_ptr() as *mut T) = t;}
        todo!()}

data might be unaligned for T, which would then be UB when we do the deref-copy

What about deliberately making your buffer oversized and using slice's align_to::<T>() method to get the correctly aligned bit?

1 Like
    pub fn push_2<T: Field>(&mut self, t: &T) {
        let mut data = vec![0_u8; std::mem::size_of::<T>()];
        unsafe {
            *(data.as_mut_ptr() as *mut T) = t;}
        self.buffer.extend(data); // EDIT: added this
        todo!()}

I think this is still UB; even if data is aligned, it contains PAD bytes. Then, when we do the self.buffer.extend, we read those PAD bytes.

Even worse yet, even if we got it into self.buffer safely, self.buffer is a Vec<u8> -- so at some point, when we hit capacity, it resizes and COPIES all the u8's over, at which point it is going to read the PAD u8 bytes, and BAM, UB again.

What are your thoughts on this updated implementation?

I adjusted it to be more like your push/pop deserializer and added a reserve operation to force a reallocation. Miri still errors out when you print the backing buffer but was perfectly happy to copy the padding bytes around.

I suspect the full answer depends on what constitutes a "read". For example, is it okay to copy uninitialized bytes around as long as their value is never involved in an IO operation or information about it otherwise escapes the program? We use roughly the same conditions for optimisations and if so, a blind memcpy() when you reallocate is fine but printing the bytes is not (the semantics implemented by Miri).

1 Like

Why do we use read_unaligned in pop, but not write_unaligned in push ?

I think you have nailed the core of the issue. I do not know the answer. Intuitively I agree with you for the following reason: suppose instead of doing a memcpy in Rust, we called the C FFI for memcpy -- surely that can't introduce UB despite copying uninitialized bytes ? On the other hand, with matters of UB, I don't trust my intuition.

Because I'm coying a *const u8 and byte arrays have an alignment of 1.

In general, calls to extern functions are conservatively assumed to have observable side-effects so I'm guessing the padding bytes in the byte array would need to be initialized to something.

Is there a ticket in the unsafe code guidelines issue tracker which might be relevant here? That's where I normally go when I have questions about the precise semantics around UB.

1 Like

Just to be pedantic, if we had T: Default, and we overwrite let mut t = T::default();, the pop could also be done without a read_unaligned right? If I'm understanding this correctly, the main reason for this 'lack of symmetry', is that we don't have a T::default() that we are overwriting ?

(I'm just paranoid about anything and everything remotely related to UB). :slight_smile:

It might be easier if you write up an example - English has ambiguities and isn't precise enough.

That said, I don't think you can get away with just std::ptr::read() because the pointer we are reading from won't necessarily be aligned correctly.

Safety

Behavior is undefined if any of the following conditions are violated:

  • src must be valid for reads.
  • src must be properly aligned. Use read_unaligned if this is not the case.
  • src must point to a properly initialized value of type T .

Note that even if T has size 0 , the pointer must be non-NULL and properly aligned.

read in std::ptr - Rust

1 Like

Good call.

    pub fn pop<T: Field>(&mut self) -> T {
        assert!(self.data.len() >= std::mem::size_of::<T>());
        let offset = self.data.len() - std::mem::size_of::<T>();
        let bytes = self.data.split_off(offset);

        let mut t = T::default();
        unsafe {
            let ptr = &mut t as *mut T as *mut u8;
            let mut out_bytes = std::slice::from_raw_parts_mut(ptr, std::mem::size_of::<T>());
            out_bytes.copy_from_slice(&bytes);}
        t}}

Here is my understanding:

  1. If we do an overwrite as a * T with 1 count, then we need read_unaligned, because we are reading an object of type T

  2. Suppose instead, we overwrite as * u8 of std::mem::sizeof::<T>() count, then it only needs to satisfy alignment of u8

  3. Now, the part I'm not sure about here is "what about the padding bytes" ?

This is a weird argument by reduction, but here is goes:

In your impl, you have: (for push)

            std::ptr::copy(value as *const T as *const u8, dest, size);

in mine above, I am considering: (for pop)

        unsafe {
            let ptr = &mut t as *mut T as *mut u8;
            let mut out_bytes = std::slice::from_raw_parts_mut(ptr, std::mem::size_of::<T>());
            out_bytes.copy_from_slice(&bytes);}

It seems to me that we both have the potential issue of "copying uninit bytes"; it could be that both our code are UB or both are safe. I don't know which it is.

In my current understanding, the core of the issue is:

Is copying uninitialized bytes UB ?

  1. If the answer is no, I expect the following to work: [1]

  2. If the answer is yes -- it causes UB, then even read_unaligned write_unaligned won't save us, because when the Vec<u8> resizes, it triggers a memcpy.

[1]

pub struct Buffer {
    data: Vec<u8>,}

impl Buffer {
    pub fn new() -> Buffer {
        Buffer { data: vec![]}}

    pub fn push<T: Field>(&mut self, t: &T) {
        unsafe {
            let ptr = t as *const T as *const u8;
            let bytes = std::slice::from_raw_parts(ptr, std::mem::size_of::<T>());
            println!("len: {:?}", bytes.len());
            self.data.extend(bytes);}}

    pub fn pop<T: Field>(&mut self) -> T {
        assert!(self.data.len() >= std::mem::size_of::<T>());
        let offset = self.data.len() - std::mem::size_of::<T>();
        let bytes = self.data.split_off(offset);

        let mut t = T::default();
        unsafe {
            let ptr = &mut t as *mut T as *mut u8;
            let mut out_bytes = std::slice::from_raw_parts_mut(ptr, std::mem::size_of::<T>());
            out_bytes.copy_from_slice(&bytes);}
        t}}

Generally, copying an uninitialized byte is not UB, however producing an invalid value is UB, and uninitialized memory is an invalid value for the type u8. It would be safe to do so if the type is instead MaybeUninit<u8>. (well, the pop method would be dangerous, as any uninitialized bytes in the array may not end up as non-padding parts of popped structs and so on)

3 Likes

I don't doubt the correctness of what you have stated. I don't see how it helps me. Can you produce sample code for push/pop T for Buffer ?

If I am understanding what you are saying correctly, it is still going to have UB when the Vec<u8> does a resize and triggers a memcpy.