Push / pop struts into Buffer, part 2

This is a followup of Push / pop structs into Buffer

I believe I have resolved all the alignment issues:

  1. when pushing, we convert to a *const u8, and copy the right # of bytes

  2. when popping, we use Default to construct a T, then we overwrite it

Interested to see if there is still potential for UB.

/*

{i, u, f} x {8, 16, 32, 64} in Field

let X be a struct/enum/tuple
  if all the subparts of X in Field, then X also in Field

In particular, Field bans the following:
  references, things with Drop handlers, ...

 */

pub trait Field: Default + Clone + Copy {}

impl Field for u8 {}
impl Field for i32 {}
impl Field for i64 {}

pub struct Buffer {
    data: Vec<u8>,}

impl Buffer {
    pub fn new() -> Buffer {
        Buffer { data: vec![]}}

    pub fn push<T: Field>(&mut self, t: &T) {
        unsafe {
            let ptr = t as *const T as *const u8;
            let bytes = std::slice::from_raw_parts(ptr, std::mem::size_of::<T>());
            println!("len: {:?}", bytes.len());
            self.data.extend(bytes);}}

    pub fn pop<T: Field>(&mut self) -> T {
        assert!(self.data.len() >= std::mem::size_of::<T>());
        let offset = self.data.len() - std::mem::size_of::<T>();
        let bytes = self.data.split_off(offset);

        let mut t = T::default();
        unsafe {
            let ptr = &mut t as *mut T as *mut u8;
            let mut out_bytes = std::slice::from_raw_parts_mut(ptr, std::mem::size_of::<T>());
            out_bytes.copy_from_slice(&bytes);}
        t}}

#[test]
fn test_00() {
    let mut b = Buffer::new();

    b.push::<i32>(&1_i32);
    b.push::<u8>(&2_u8);
    b.push::<i64>(&3_i64);

    assert_eq!(b.pop::<i64>(), 3_i64);
    assert_eq!(b.pop::<u8>(), 2_u8);
    assert_eq!(b.pop::<i32>(), 1_i32);}
1 Like

It looks pretty good, apart from the Field trait. It should be marked as unsafe so that the user has to guarantee their type has no padding or niches:

pub unsafe trait Field: Default + Copy + 'static { ... }

Also, having a Copy bound on the field trait implies Clone, so the Clone bound isn't needed.

Edit: Also, the field trait could do with a 'static bound so it can't be implemented for references.

1 Like
  1. don't understand: why does this break if there is padding? (Argument for why it should not break: we copy the padding over to the buffer, then we copy it back.) I intend to use this with things like
#[repr(C)]
struct Foo {
  a: u64,
  b: u8,
  c: u64,
  d: u8,
  e: u64
}

which I believe will have internal padding

  1. Also, in this context, what is a 'niche' ?
1 Like

Padding is uninitialised memory, and that will cause Undefined Behaviour if you copy it into your Vec<u8>.

https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#padding

A niche is an invalid bit pattern of a type. It is Undefined Behaviour to create a type with an invalid bit pattern.

For example, if I push 0_i32, then pop a NonZeroI32, that constitutes UB, since the all zero bit pattern is not valid for NonZeroI32.

https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#niche

2 Likes

With regards to Padding, is this what you are saying: ?

  1. #[repr(C)] struct Foo { a: u64, b: u8, c: u64} forces 8 bytes of u64, 1 byte of u8, 7 bytes of PAD, 8 bytes of u64

  2. the 7 bytes of PAD are uninitialized

  3. the moment I read one of those 7 bytes, I get UB

1 Like

Generally, Yes.

Technically, this is only UB if the type read into does not have uninitialised bits as one of its valid bit patterns. For example, MaybeUninit<u8> does allow uninitialised bits, so it would be ok to read into it. The problem will then come when you allow reading the uninitialised bytes back out as another type that does not allow being uninitialised.

1 Like

Yeah, Impossible to blackbox copy T -> unaligned [u8]? - #28 by alice also converges to MaybeUinit<u8>. This is one of those weird problems that is completely frustrating while working on it, but has a trivially simple solution once you see the solution.

1 Like

The solution is always obvious once you know it... XD

C semantics allow copying of padding bytes as an exception (so that you can memcpy whole structs), but it's UB to use these bytes for anything else. In LLVM you can't have a condition that uses an uninitialized byte, regardless of where it came from.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.