Push / pop structs into Buffer

pub trait Field: Clone + Copy {}

impl Field for i32 {}
impl Field for u8 {}
impl Field for i64 {}

pub struct Buffer {
    data: Vec<u8>,}

impl Buffer {
    pub fn new() -> Buffer {
        Buffer { data: vec![]}}

    pub fn push<T: Field>(&mut self, t: &T) {
        let mut bytes = vec![0_u8; std::mem::size_of::<T>()];
        unsafe {
            *std::mem::transmute::<*mut u8, *mut T>(bytes.as_mut_ptr()) = *t;}
        self.data.extend(bytes);}

    pub fn pop<T: Field>(&mut self) -> T {
        assert!(self.data.len() > std::mem::size_of::<T>());
        let offset = self.data.len() - std::mem::size_of::<T>();
        let bytes = self.data.split_off(offset);
        unsafe { *std::mem::transmute::<*const u8, *const T>(bytes.as_ptr()) }}}

#[test]
fn test_00() {
    let mut b = Buffer::new();

    b.push::<i32>(&1_i32);
    b.push::<u8>(&2_u8);
    b.push::<i64>(&3_i64);

    assert_eq!(b.pop::<i64>(), 3);
    assert_eq!(b.pop::<u8>(), 2);
    assert_eq!(b.pop::<i32>(), 1);

Besides mismatched push/pop, is the above code in danger of UB ?

The gist is that for any T: Copy + Clone, we can push/pop it into/from a Vec<u8>.

===

If there is a term for this more specific than serialization/deserialization, I'm interested in that too.

For one, alignment of the Vec<u8> buffer might not match that of T.

1 Like

I broke it with 100% safe code!

#[derive(Debug, Copy, Clone)]
struct Reference<'a>(&'a i64);

fn add_local_variable(b: &mut Buffer) {
    let value = 42_i64;
    let reference = (Reference(&value));
    b.push(&reference);
}

impl<'a> Field for Reference<'a> {}

#[test]
fn use_after_free() {
    let mut b = Buffer::new();
    
    add_local_variable(&mut b);
    
    let got: Reference = b.pop();
    assert_eq!(*got.0, 42);
}

Fails with:

thread 'tests::use_after_free' panicked at 'assertion failed: `(left == right)`
  left: `140101426636416`,
 right: `42`', src/lib.rs:68:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

(playground)

You need to add a 'static bound to either T or Field.

Arguably, push() and pop() should be unsafe functions because I need to "remember" which order things were pushed/popped in in order to maintain memory safety. If my pushes/pops are out of order then it needs to be safe to transmute between what type the memory originally had (or types, if I'm actually popping the first half of one object and the second half of another) and the type I'm popping off.

It shouldn't be possible for the developer to invoke UB by accidentally doing things in the wrong order.


I'm not sure what would happen if you pass in a trait object (i.e. T = dyn Foo). What gets copied then?


Your push() method also feels unnecessarily convoluted and we can skip an allocation.

    pub fn push<T: Field>(&mut self, t: &T) {
        unsafe {
            let underlying_data = std::slice::from_raw_parts(
                 t as *const T as *const u8, 
                std::mem::size_of::<T>(),
            );
            self.data.extend_from_slice(underlying_data);
        }
    }

Miri also doesn't like your pointer transmute operation.

error: Undefined Behavior: accessing memory with alignment 1, but alignment 4 is required
  --> src/main.rs:24:13
   |
24 |             *std::mem::transmute::<*mut u8, *mut T>(bytes.as_mut_ptr()) = *t;
   |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ accessing memory with alignment 1, but alignment 4 is required
   |
   = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
   = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
           
   = note: inside `Buffer::push::<i32>` at src/main.rs:24:13
note: inside `test_00` at src/main.rs:40:5
  --> src/main.rs:40:5
   |
40 |     b.push::<i32>(&1_i32);
   |     ^^^^^^^^^^^^^^^^^^^^^
note: inside `main` at src/main.rs:3:5
  --> src/main.rs:3:5
   |
3  |     test_00();
   |     ^^^^^^^^^
   = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
   = note: inside `std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:125:18
   = note: inside closure at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:63:18
   = note: inside `std::ops::function::impls::<impl std::ops::FnOnce<()> for &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>::call_once` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:259:13
   = note: inside `std::panicking::r#try::do_call::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:401:40
   = note: inside `std::panicking::r#try::<i32, &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:365:19
   = note: inside `std::panic::catch_unwind::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:434:14
   = note: inside closure at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:45:48
   = note: inside `std::panicking::r#try::do_call::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:401:40
   = note: inside `std::panicking::r#try::<isize, [closure@std::rt::lang_start_internal::{closure#2}]>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:365:19
   = note: inside `std::panic::catch_unwind::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:434:14
   = note: inside `std::rt::lang_start_internal` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:45:20
   = note: inside `std::rt::lang_start::<()>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:62:5

error: aborting due to previous error

(playground)

2 Likes

That won't stop someone from pushing 0usize and then popping &'static Foo/NonZeroUsize, thus resulting in UB again. Another problem is writing Ts that contain padding bytes since they're not considered initialized.

Assuming I am willing to disallow: refs, pointers, 0 size fields

and only store "raw" data (and enums / structs composed of "raw data"). Can this be fixed?

My understanding with regards to alignment is that because pop() returns a T, not a &T, there is a copy involved, which should straight things out. If this is incorrect, I am interested in learning why.

The problem is that you need to do a copy at some point, and if your pointer isn't aligned then the act of reading the value (so it can be returned) will trigger an unaligned read.

This sounds like a pattern I've seen in C where you'll write a struct's bytes directly into a file then do a transmute to "parse" the data out. It's hacky and quite unsafe, but you have a lot of the same constraints (no pointers, not allowed to read padding bytes, etc.).

To avoid accidentally treating padding bytes as initialized data you would need to a) strictly enforce the correct push/pop order, or b) figure out a way to ensure your T doesn't contain any padding (e.g. with a proc macro that adds #[repr(packed)] and implements some sort of ReprPacked marker trait).

Yep, that's why I said push() and pop() need to be unsafe functions. It's up to the caller to ensure that you push and pop "compatible" data.

3 Likes

I think you need to disallow any type that contains a niche (this also includes NonZeroUxx and a lot of enums) and have padding bytes.

If this is supposed to be used in some kind of application I would suggest you to look at the bytemuck crate, it has a trait called Pod which is similar to your Field, offers derive macros to automatically implement it if it's safe to do so and also offers utilities to transmute types that implement it to/from bytes.

3 Likes

Ate you trying to write a https://github.com/TimelyDataflow/abomonation clone? :wink:

2 Likes

Here is the XY problem I am running into.

Define set S as follows:
{u8, i8, u16, i16, u32, i32, u64, i64, f32, f64} belong in S
tuples of elements in S belongs in S
structs of elements in S belong in S
enums of elements in S belong in S

At runtime, we get types (t1, t2, t3, ..., t_n) all in S and we want to construct an array of elements, where each element has type (t1, t2, t3, ..., t_n).

===

One concrete example of this problem is writing a in memory backend for Sql that stores tables in row order. At runtime, we get that there are $n$ columns, and that they have types t1, t2, ..., t_n. and we want to create an array of rows.

Another situation this problem shows up with is playing with entity-component-systems that are stored in row-order rather than col-order.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.