Best way to work with structured "view" on a slice of u8

Hi there,

I'm currently trying to solve the following problem in the most elegant and as less as possble unsafe code blocks:

From an interface I do have a bunch of u8 either represented as [u8] or Vec<u8>. However, the data does have some kind of sructure that depends on the first part of the data. So in C world I wuld use some pointer arithmetic and pointer type casts to get the structured representation of the slice of data in question. But how could I do this in Rust? I thought about Box::from_raw but than I need to ensure that Drop is never called for the box and use mem::forget which seems unsound and error prone. Is there any "ideomatic" way to cast parts of the slice into a structure reference that does not try to deallocate the memory once it's going out of scope ?

I tried several approaches but did not feel really confident with any ...
So seeking for some expert advice here :wink:

It's given that the data in question is always either a Vec<u8> or an array of u8. And this data is always outliving any reference that is used to project slices of the data into structures to read and manipulate the data.

Any hint is very welcome :slight_smile:

Requires unsafe: If you know the C-style layout, you could create a #[repr(C)] struct, get the pointer from a slice using as_ptr and casting that pointer to a *const MyStruct. Afterwards, you can make the function return a reference that matches the slice's lifetime. (In case of a Vec<u8> it would have to be an owned MyStruct)

The best tool out there, imho, to do this kind of unsafe / dangerous casts is with ::zerocopy's traits:

  1. Add a #[derive(FromBytes, Unaligned)] on your #[repr(C)] struct;

    • (you do not need Unaligned if you know the bytes that will be reinterpreted are already well-aligned),
  2. Call

    LayoutVerified::new(byte_slice)
        .expect("slice of incorrect length or alignment")
    

    to construct a LayoutVerified<&'_ [u8], YourStruct>, which is a wrapper type around the &'_ [u8] byte slice (i.e., a type-level annotation) that expresses that such (wrapped) byte slice can be reinterpreted as a &'_ YourStruct

example
use ::zerocopy::{
    FromBytes,
    LayoutVerified,
    Unaligned as OneAligned,
};

#[derive(
    Debug,
    Clone, Copy,
    PartialEq, Eq,
    FromBytes, OneAligned,
)]
#[repr(C)]
struct Foo {
    x: u8,
    y: u8,
}

fn main ()
{
    let byte_slice: &'static [u8] = b"hi";
    let foo: &'static Foo = LayoutVerified::new(byte_slice).unwrap().into_ref();
    assert_eq!(*foo, Foo { x: b'h', y: b'i' });
}

Regarding the owned variant, however, is more complex, mainly because of alignment. Here it is very highly advised that YourStruct be Copy to ensure lack of drop glue, and it is imperative that the alignment of your struct be 1, i.e., that you do have the Unaligned derive on your struct to get a compile error when it is not the case. Then, when the length of the vec is equal to the mem::size_of::<YourStruct>(), you:

  1. .into_boxed_slice() the vec to drop any extra capacity allocator-wise.
    This yields a Box<[u8]>.

  2. Box::into_raw() it go get a (fat) raw pointer: *mut [u8].

  3. .thin() it down to a *mut u8, either by doing as *mut u8 (dangerous: little type safety there), or through some helper method:

    trait ThinYourPointersWithThisSimpleTrickCallNow {
        type Ret;
        fn thin (self) -> Self::Ret;
    }
    impl<T> ThinYourPointersWithThisSimpleTrickCallNow
        for *mut [T]
    {
        type Ret = *mut T;
    
        #[inline]
        fn thin (self: *mut [T]) -> *mut T
        {
            self as _
        }
    }
    
    // now you can do:
    fat_ptr.thin()
    
  4. .cast() that pointer to *mut YourStruct.

  5. Upgrade it to Box, through a(n unsafe) call to Box::from_raw.

example
use ::core::{mem, ops::Not as _};
use ::zerocopy::{
    FromBytes,
    LayoutVerified,
    Unaligned as OneAligned,
};

#[derive(
    Debug,
    Clone, Copy,
    PartialEq, Eq,
    FromBytes, OneAligned,
)]
#[repr(C)]
struct Foo {
    x: u8,
    y: u8,
}

fn main ()
{
    let vec: Vec<u8> = b"hi".to_vec();
    // Feel free to factor that out into its own function:
    let boxed_foo: Box<Foo> = {
        assert_eq!(1, mem::align_of::<Foo>());
        assert_eq!(vec.len(), mem::size_of::<Foo>());
        assert!(mem::needs_drop::<Foo>().not());
        let boxed_slice: Box<[u8]> = vec.into_boxed_slice();
        let fat_ptr: *mut [u8] = Box::into_raw(boxed_slice);
        let thin_ptr: *mut u8 = fat_ptr.thin();
        let foo_ptr: *mut Foo = thin_ptr.cast::<Foo>();
        unsafe { Box::from_raw(foo_ptr) }
    };
    assert_eq!(*foo, Foo { x: b'h', y: b'i' });
}
3 Likes

I would create a wrapper struct and then provide getter methods which do the appropriate pointer casting.

From what I've seen, this is what is done by a lot of low level crates that work with binary formats.

Example implementing a view over a TCP packet with getters to retrieve each field (collapsed because it's kinda long).
use std::{
    convert::TryFrom,
    mem::{align_of, size_of},
};

pub struct TcpPacket<'a> {
    raw: &'a [u8],
}

impl<'a> TcpPacket<'a> {
    fn item_at_offset<T>(&self, offset: usize) -> T
    where
        T: Copy,
    {
        unsafe {
            // do the pointer math and casts
            let ptr = self
                .raw
                .as_ptr()
                .offset(isize::try_from(offset).unwrap())
                .cast::<T>();

            // and some sanity checks
            debug_assert!(
                ptr as usize % align_of::<T>() == 0,
                "The resulting pointer isn't aligned"
            );
            debug_assert!(
                size_of::<T>() + offset < self.raw.len(),
                "The pointer would lie outside our buffer"
            );

            // it should be valid, now we can actually read the value
            ptr.read()
        }
    }

    pub fn source_port(&self) -> u16 {
        u16::from_be(self.item_at_offset(0))
    }

    pub fn dest_port(&self) -> u16 {
        u16::from_be(self.item_at_offset(2))
    }

    pub fn sequence_number(&self) -> u32 {
        u32::from_be(self.item_at_offset(4))
    }

    pub fn ack_number(&self) -> u32 {
        u32::from_be(self.item_at_offset(8))
    }
}

fn main() {
    // https://erg.abdn.ac.uk/users/gorry/course/inet-pages/packet-decode3.html
    let random_tcp_packet = vec![
        0x90, 0x05, 0x00, 0x17, 0x72, 0x14, 0xf1, 0x14, 0x00, 0x00, 0x00, 0x00, 0x60, 0x02, 0x22,
        0x38, 0xa9, 0x2c, 0x00, 0x00, 0x02, 0x04, 05, 0xb4,
    ];

    let packet = TcpPacket {
        raw: &random_tcp_packet,
    };

    assert_eq!(packet.source_port(), 36869);
    assert_eq!(packet.dest_port(), 23);
    assert_eq!(packet.sequence_number(), 1913975060);
    assert_eq!(packet.ack_number(), 0);
}

(playground)

4 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.