So, there are two questions at hand here:
How do we memcmp
two "bags" of bytes ([u8]
) to test for equality?
Easy, just use the ==
operator on these bytes: <[u8] as Eq>::eq()
does use memcmp
.
Can I see a struct as just a "bag of bytes"? If so, how?
Now, this is where things are subtle. The general answer is: "No, you cannot!"
Indeed, you can't go and feed a struct you know nothing about to the following function:
unsafe // Can be UB!
fn as_bag_of_bytes<T: ?Sized> (
ptr: &'_ T,
) -> &'_ [u8]
{
::core::slice::from_raw_parts(
// ptr
ptr as *const T as *const u8,
// len
::core::mem::size_of_val(ptr),
)
}
The root cause of that being UB is that T
may have padding bytes, as @Hyeonu said (I didn't know about the zero-initialized "exception" to the rule; anyways, since you won't be calling that function on a statically known zero-initialized struct, that exception in practice doesn't even count). So, if your struct has padding, you cannot call as_bag_of_bytes()
on it.
That being said, it would be nice to have the above function for padding-less types, such as primitives or structs that have been carefully crafted to ensure they do not contain any padding.
And this is indeed possible:
-
Define a new unsafe trait AsBytes
(or NoPadding
), that we will implement for types that have no padding. This way the above generic function can have a T : AsBytes
bound and no longer be marked unsafe
!
/// Unsafe marker trait for types that are valid to cast as a slice of bytes.
///
/// This is true of primitive types, and recursively for `#[repr(C)]`
/// compositions of such types, **as long as there is no padding** (such as
/// arrays).
///
/// The derive macro takes care of deriving this trait with the necessary
/// compile-time guards.
unsafe trait AsBytes {
fn as_bytes (self: &'_ Self)
-> &'_ [u8]
{
unsafe {
// # Safety
//
// - contract of the trait
::core::slice::from_raw_parts(
self
as *const Self
as *const u8
,
::core::mem::size_of_val(self),
)
}
}
}
-
unsafe impl AsBytes
for:
-
primitive types:
(
unsafe
impl $Trait:path, for primitive_types!() $(;)?
) => (
impl_macro!(@impl_for_all
unsafe
impl $Trait, for [
u8, i8,
u16, i16,
u32, i32,
usize, isize,
u64, i64,
u128, i128,
f32,
f64,
{T : ?Sized} *const T,
{T : ?Sized} *mut T,
(),
{T : ?Sized} ::core::marker::PhantomData<T>,
// the following are only safe to **view** as bytes,
// do not create them from bytes!
bool,
::core::num::NonZeroU8, ::core::num::NonZeroI8,
::core::num::NonZeroU16, ::core::num::NonZeroI16,
::core::num::NonZeroU32, ::core::num::NonZeroI32,
::core::num::NonZeroUsize, ::core::num::NonZeroIsize,
::core::num::NonZeroU64, ::core::num::NonZeroI64,
::core::num::NonZeroU128, ::core::num::NonZeroI128,
{'a, T : 'a + ?Sized} &'a T,
{'a, T : 'a + ?Sized} &'a mut T,
{T : ?Sized} ::core::ptr::NonNull<T>,
str,
]
);
-
composite types; tuples are #[repr(Rust)]
structs, so I do not include them, since their layout is allowed to change; we end up with composite types being arrays and slices:
(
unsafe
impl $Trait:path, for array_types!() $(;)?
) => (
impl_macro!(@impl_for_all
unsafe
impl $Trait, for [
{T : $Trait} [T],
{T } [T; 0],
{T : $Trait} [T; 1],
{T : $Trait} [T; 2],
{T : $Trait} [T; 3],
{T : $Trait} [T; 4],
{T : $Trait} [T; 5],
{T : $Trait} [T; 6],
{T : $Trait} [T; 7],
{T : $Trait} [T; 8],
{T : $Trait} [T; 9],
{T : $Trait} [T; 10],
{T : $Trait} [T; 11],
{T : $Trait} [T; 12],
{T : $Trait} [T; 13],
{T : $Trait} [T; 14],
{T : $Trait} [T; 15],
{T : $Trait} [T; 16],
{T : $Trait} [T; 17],
{T : $Trait} [T; 18],
{T : $Trait} [T; 19],
{T : $Trait} [T; 20],
{T : $Trait} [T; 21],
{T : $Trait} [T; 22],
{T : $Trait} [T; 23],
{T : $Trait} [T; 24],
{T : $Trait} [T; 25],
{T : $Trait} [T; 26],
{T : $Trait} [T; 27],
{T : $Trait} [T; 28],
{T : $Trait} [T; 29],
{T : $Trait} [T; 30],
{T : $Trait} [T; 31],
{T : $Trait} [T; 32],
{T : $Trait} [T; 64],
{T : $Trait} [T; 128],
{T : $Trait} [T; 256],
{T : $Trait} [T; 512],
{T : $Trait} [T; 1024],
{T : $Trait} [T; 2048],
{T : $Trait} [T; 4096],
]
);
);
impl_macro! {
unsafe
impl AsBytes, for primitive_types!()
}
impl_macro! {
unsafe
impl AsBytes, for array_types!()
}
-
Generate a #[derive(AsBytes)]
procedural macro (macro_rules!
macro for the playground) that checks:
-
that the struct is #[repr(C)]
or #[repr(transparent)]
, since it is mandatory when wanting to rely on the layout of a struct (in the case of a macro_rules!
macro for the playground, I have skipped the #[repr(transparent)]
case;
-
that each field of the struct
is AsBytes
on its own:
$(
const_assert!(
$field_ty : $crate::AsBytes,
);
)*
-
that there is no padding, by checking that the total size of the struct is equal to the sum of the sizes of its constituents:
const_assert!(
::core::mem::size_of::<$StructName>() ==
(0 $(+ ::core::mem::size_of::<$field_ty>())*)
);
-
so that it can soundly unsafe impl AsBytes
for that struct:
unsafe impl $crate::AsBytes for $StructName {}
And now you can just call .as_bytes()
on valid types and you'll get a zero-cost &[u8]
, that you can then ==
compare to get an efficienct memcmp
!
derive_AsBytes! {
#[repr(C)]
struct Ok {
a: u16,
b: u8,
c: u8,
}
}
#[cfg(FALSE)] // Uncomment this line to get compilation errors
mod fails {
derive_AsBytes! {
#[repr(C)]
struct InnerPadding {
a: u8,
// inner padding byte
b: u16,
}
}
derive_AsBytes! {
#[repr(C)]
struct TrailingPadding {
a: u16,
b: u8,
// trailing padding byte
}
}
}
fn main ()
{
dbg!(Ok { a: 10752, b: 27, c: 0 }.as_bytes());
}
-
yields
[src/main.rs:260] Ok{a: 10752, b: 27, c: 0,}.as_bytes() = [
0,
42,
27,
0,
]
-
Playground
If that sounds like a tedious macro to write, and you think a crate should be exporting such functionality, then you are right! There already is such a crate, from which I've taken this idea:
There is another way to avoid padding, and that's by adding a #[repr(packed)]
attribute on a struct
. However, this adds a whole can of worms / bugs on itself, since now all the reads and writes on the fields of the struct need to be unaligned
reads/writes using raw pointers, which is quite error-prone and thus unsafe
. The only way this solution is easy to do is when all its fields have an alignment of 1
, such as when using ::zerocopy::byteorder
integer types. But in that case #[repr(packed)]
is not doing anything, and we are back to a #[repr(C)]
struct carefully crafted without padding bytes.