Check for zero C struct memory

Yandros · October 20, 2019, 12:46pm

So, there are two questions at hand here:

How do we `memcmp` two "bags" of bytes (`[u8]`) to test for equality?

Easy, just use the == operator on these bytes: <[u8] as Eq>::eq() does use memcmp.

Can I see a struct as just a "bag of bytes"? If so, how?

Now, this is where things are subtle. The general answer is: "No, you cannot!"

Indeed, you can't go and feed a struct you know nothing about to the following function:

unsafe // Can be UB!
fn as_bag_of_bytes<T: ?Sized> (
    ptr: &'_ T,
) -> &'_ [u8]
{
    ::core::slice::from_raw_parts(
        // ptr
        ptr as *const T as *const u8,

        // len
        ::core::mem::size_of_val(ptr),
    )
}

The root cause of that being UB is that T may have padding bytes, as @Hyeonu said (I didn't know about the zero-initialized "exception" to the rule; anyways, since you won't be calling that function on a statically known zero-initialized struct, that exception in practice doesn't even count). So, if your struct has padding, you cannot call as_bag_of_bytes() on it.

That being said, it would be nice to have the above function for padding-less types, such as primitives or structs that have been carefully crafted to ensure they do not contain any padding.

And this is indeed possible:

Define a new unsafe trait AsBytes (or NoPadding), that we will implement for types that have no padding. This way the above generic function can have a T : AsBytes bound and no longer be marked unsafe!

/// Unsafe marker trait for types that are valid to cast as a slice of bytes.
///
/// This is true of primitive types, and recursively for `#[repr(C)]`
/// compositions of such types, **as long as there is no padding** (such as
/// arrays).
///
/// The derive macro takes care of deriving this trait with the necessary
/// compile-time guards.
unsafe trait AsBytes {
    fn as_bytes (self: &'_ Self)
        -> &'_ [u8]
    {
        unsafe {
            // # Safety
            //
            //   - contract of the trait
            ::core::slice::from_raw_parts(
                self
                    as *const Self
                    as *const u8
                ,
                ::core::mem::size_of_val(self),
            )
        }
    }
}

unsafe impl AsBytes for:

primitive types:

(
    unsafe
    impl $Trait:path, for primitive_types!() $(;)?
) => (
impl_macro!(@impl_for_all
    unsafe
    impl $Trait, for [
        u8,     i8,
        u16,    i16,
        u32,    i32,
        usize,  isize,
        u64,    i64,
        u128,   i128,
        f32,
        f64,
        {T : ?Sized} *const T,
        {T : ?Sized} *mut T,
        (),
        {T : ?Sized} ::core::marker::PhantomData<T>,
        
        // the following are only safe to **view** as bytes,
        // do not create them from bytes!
        bool,
        ::core::num::NonZeroU8,     ::core::num::NonZeroI8,
        ::core::num::NonZeroU16,    ::core::num::NonZeroI16,
        ::core::num::NonZeroU32,    ::core::num::NonZeroI32,
        ::core::num::NonZeroUsize,  ::core::num::NonZeroIsize,
        ::core::num::NonZeroU64,    ::core::num::NonZeroI64,
        ::core::num::NonZeroU128,   ::core::num::NonZeroI128,
        {'a, T : 'a + ?Sized} &'a T,
        {'a, T : 'a + ?Sized} &'a mut T,
        {T : ?Sized} ::core::ptr::NonNull<T>,
        str,
    ]
);

composite types; tuples are #[repr(Rust)] structs, so I do not include them, since their layout is allowed to change; we end up with composite types being arrays and slices:

(
    unsafe
    impl $Trait:path, for array_types!() $(;)?
) => (
    impl_macro!(@impl_for_all
        unsafe
        impl $Trait, for [
            {T : $Trait} [T],
            {T         } [T;    0],
            {T : $Trait} [T;    1],
            {T : $Trait} [T;    2],
            {T : $Trait} [T;    3],
            {T : $Trait} [T;    4],
            {T : $Trait} [T;    5],
            {T : $Trait} [T;    6],
            {T : $Trait} [T;    7],
            {T : $Trait} [T;    8],
            {T : $Trait} [T;    9],
            {T : $Trait} [T;   10],
            {T : $Trait} [T;   11],
            {T : $Trait} [T;   12],
            {T : $Trait} [T;   13],
            {T : $Trait} [T;   14],
            {T : $Trait} [T;   15],
            {T : $Trait} [T;   16],
            {T : $Trait} [T;   17],
            {T : $Trait} [T;   18],
            {T : $Trait} [T;   19],
            {T : $Trait} [T;   20],
            {T : $Trait} [T;   21],
            {T : $Trait} [T;   22],
            {T : $Trait} [T;   23],
            {T : $Trait} [T;   24],
            {T : $Trait} [T;   25],
            {T : $Trait} [T;   26],
            {T : $Trait} [T;   27],
            {T : $Trait} [T;   28],
            {T : $Trait} [T;   29],
            {T : $Trait} [T;   30],
            {T : $Trait} [T;   31],
            {T : $Trait} [T;   32],
            {T : $Trait} [T;   64],
            {T : $Trait} [T;  128],
            {T : $Trait} [T;  256],
            {T : $Trait} [T;  512],
            {T : $Trait} [T; 1024],
            {T : $Trait} [T; 2048],
            {T : $Trait} [T; 4096],
        ]
    );
);

impl_macro! {
    unsafe
    impl AsBytes, for primitive_types!()
}
impl_macro! {
    unsafe
    impl AsBytes, for array_types!()
}

Generate a #[derive(AsBytes)] procedural macro (macro_rules! macro for the playground) that checks:
- that the struct is #[repr(C)] or #[repr(transparent)], since it is mandatory when wanting to rely on the layout of a struct (in the case of a macro_rules! macro for the playground, I have skipped the #[repr(transparent)] case;
- that each field of the struct is AsBytes on its own:
```
$(
    const_assert!(
        $field_ty : $crate::AsBytes,
    );
)*
```
- that there is no padding, by checking that the total size of the struct is equal to the sum of the sizes of its constituents:
```
const_assert!(
    ::core::mem::size_of::<$StructName>() ==
    (0 $(+ ::core::mem::size_of::<$field_ty>())*)
);
```
- so that it can soundly unsafe impl AsBytes for that struct:
```
unsafe impl $crate::AsBytes for $StructName {}
```

And now you can just call .as_bytes() on valid types and you'll get a zero-cost &[u8], that you can then == compare to get an efficienct memcmp!

derive_AsBytes! {
    #[repr(C)]
    struct Ok {
        a: u16,
        b: u8,
        c: u8,
    }
}

#[cfg(FALSE)] // Uncomment this line to get compilation errors
mod fails {
    derive_AsBytes! {
        #[repr(C)]
        struct InnerPadding {
            a: u8,
            // inner padding byte
            b: u16,
        }
    }
    
    derive_AsBytes! {
        #[repr(C)]
        struct TrailingPadding {
            a: u16,
            b: u8,
            // trailing padding byte
        }
    }
}

fn main ()
{
    dbg!(Ok { a: 10752, b: 27, c: 0 }.as_bytes());
}

yields

[src/main.rs:260] Ok{a: 10752, b: 27, c: 0,}.as_bytes() = [
    0,
    42,
    27,
    0,
]

Playground

If that sounds like a tedious macro to write, and you think a crate should be exporting such functionality, then you are right! There already is such a crate, from which I've taken this idea:

::zerocopy

There is another way to avoid padding, and that's by adding a #[repr(packed)] attribute on a struct. However, this adds a whole can of worms / bugs on itself, since now all the reads and writes on the fields of the struct need to be unaligned reads/writes using raw pointers, which is quite error-prone and thus unsafe. The only way this solution is easy to do is when all its fields have an alignment of 1, such as when using ::zerocopy::byteorder integer types. But in that case #[repr(packed)] is not doing anything, and we are back to a #[repr(C)] struct carefully crafted without padding bytes.

Topic		Replies	Views
Force padding to be zero, not uninit help	9	505	January 3, 2025
Working with C - allocating structs on the stack help	11	2361	April 19, 2023
Working with identity (comparing equality of references/pointers) help	70	5192	February 3, 2022
Safety problem writing structs into [u8]	2	470	July 25, 2022
Comparisons between Arrays static mut help	8	418	May 18, 2023

Check for zero C struct memory

How do we memcmp two "bags" of bytes ([u8]) to test for equality?

Can I see a struct as just a "bag of bytes"? If so, how?

Related topics

How do we `memcmp` two "bags" of bytes (`[u8]`) to test for equality?