I'm performing network packet capturing. The data returned by the capturing interface (pcap actually) is a piece of memory in &[u8]. I need to convert it to an object, the size of which is several kBytes.
So I wrote a from_raw function as
#[repr(C)]
pub struct DataFrame {
pub meta_data: MetaData,
pub payload: [Complex<f32>; NCH_PER_PKT],
}
//impl Default omitted here
impl DataFrame {
pub fn from_raw(src: &[u8]) -> Box<Self> {
let mut result = unsafe { Box::<DataFrame>::new_uninit().assume_init() };
let result_ptr = unsafe {
std::slice::from_raw_parts_mut(
result.as_mut() as *mut DataFrame as *mut u8,
std::mem::size_of::<DataFrame>(),
)
};
result.clone_from_slice(src);
result
}
}
I have two questions:
Shall I return Box<Self> or just return Self?
I though that Self is a big object, and if not boxed, it would be allocated on stack, and returning the object triggers copying. So I box it. Am I right?
Is there a better way to construct the object from a piece of raw data?
If you don't need a heap allocation, then you shouldn't create one. Just return Self if that's what you need.
No, not really. The compiler aggressively optimizes trivial memory copying. Anyway, a few kB worth of copying is certainly negligible compared to network communication, so it simply doesn't matter in most realistic cases.
I would avoid unsafe entirely. It is very unsafe to reinterpret arbitrary bytes (which in addition come from the network!) as another type, because Rust types often have several non-obvious invariants to be upheld, violating which results in undefined behavior. And even when this is not the case, in the case of computers separated by a network, cross-platform issues such as endianness mismatch can arise.
If you need to parse a network packet, write/use a real network packet parser instead.
You didn't share the definition for MetaData, but unless all fields are valid for all possible bit patterns this is instant UB.
Whenever creating some sort of MaybeUninit value, you should be initializing it with something before calling assume_init().
A more sound implementation would be something like this:
impl DataFrame {
unsafe fn from_raw(src: &[u8]) -> Self {
// Let's not accidentally have an out-of-bounds read or leave
// part of the data frame uninitialized
assert_eq!(src.len(), std::mem::size_of::<DataFrame>());
// Create an uninitialized data frame
let mut frame: MaybeUninit<DataFrame> = MaybeUninit::uninit();
// initialize it by copying the bytes across
frame
.as_mut_ptr()
.copy_from_nonoverlapping(src.as_ptr().cast(), 1);
// it's now initialized so we can return it
frame.assume_init()
}
}
I've chosen to store the MaybeUninit<DataFrame> as a value on the stack because I don't know how big it'll be. The steps would be almost identical if you used Box::new_uninit(), though.
Another big thing is that I've marked the from_raw() method as unsafe. Unless DataFrame is valid for all possible bit patterns, not marking it as unsafe would be unsound.
This is also important when you take into account the fact that std::ptr::copy_nonoverlapping() and std::ptr::copy() (Rust's equivalent of memset() and memcpy()) require both arguments to be aligned, so if bytes doesn't have the correct alignment we're gonna have a bad time.
Yes. Don't use unsafe and type punning.
You can get almost identical performance by parsing the frame like you would when dealing with any other binary format. A lot of the time, the optimiser will be able to see that things like f64::from_le_bytes() are just loading 8 bytes into a floating point register and will turn the whole thing into the same memcpy().
Writing safe code and relying on the optimiser to make it efficient is better in the long run because the optimiser's transformations only occur when they are 100% sound, whereas history has shown unsafe code that is checked by a human is considerably more error-prone.