Reading/writing a struct buffer

I want to implement a binary protocol over TCP. I created a struct to hold a simple "command header":

#[repr(C)]
struct CmdHdr {
   cmd: u32,
   len: u32
}

After constructing one of these I want write it [the raw buffer] to a socket. Since there's no void* I knew there wouldn't be any socket.write(buf, len) type, but I assumed there would be some other means of just writing a struct to a io object of some kind (file, TcpStream, etc). After a few quick searches I first found an (unsafe) transmute example, then I found another unsafe example. Then I found people saying one can use serde and bincode, but afaict these imply copying the buffer (?), which I don't want to do -- I want to send the raw buffer as-is (endianess and alignment issues will be handled later, this is just for a proof-of-concept).

I eventually stumbled upon the zerocopy crate (thanks to some old posts on here) which appear to do what I want .. but I gotta say I'm a little surprised that just wanting to write a struct buffer turned out to be a so much larger adventure than I had anticipated.

Just to be sure -- there isn't some obvious way to Just Write A Struct(tm) to a file/socket in the standard library that I am missing? Is using something akin to the crate zerocopy the way to do it?

There's nothing in the standard library.

Some of the basic problems that arise:

  • Endianness. Integers are in whatever the machine native endianness is, but the majority of data transfer protocols want a fixed endianness to ensure compatibility between different machines. Rust does have functions on integers to convert them to bytes, but they force you to specifically think about this problem. (to_{be,le,ne}_bytes)

  • Padding:

    #[repr(C)]
    struct Struct {
        foo: u8,
        bar: u32,
    }
    

    This struct has three bytes of padding inbetween foo and bar. These bytes are uninitialized, so reading them in order to write them to a socket would be undefined behavior.

    • In fact, merely casting &Struct to &[u8] is generally regarded today as undefined behavior. (you could, however, cast it to &[MaybeUninit<u8>]).
1 Like

I've been thinking about this since I posted the original post yesterday.

The thing I was thinking was "But if I already am aware of endianess, alignment and padding issues, let's say I know the two end-points can handle these things (graciously) -- why can't I just 'send a buffer'?"

But while reading your reply I realized that the rust philosophy that is being enforced by the borrowchecker isn't limited to mere memory access, the rust library doesn't want to provide tools to make the same mistakes that everyone else has done before, but rather provide better tools to immediately do what one should have been doing all along.

In my case, I was just trying to make a simple prototype, and the real code would do all the mashalling/unmarshalling properly. But I think I have managed to convince myself that Rust making it hard to write these broken-by-design prototypes, and instead focus on making it easier to put together the proper code, is a Good Thing(tm).

... at least until next time I get stuck on something that's easy to do in C that I'm struggling with in Rust. :stuck_out_tongue:

The thing I was thinking was "But if I already am aware of endianess, alignment and padding issues, let's say I know the two end-points can handle these things (graciously) -- why can't I just 'send a buffer'?"

The padding issue is an even bigger threat. The behavior of owning or having a reference to an uninitialized T (of size > 0 bytes) is not implementation-defined, it is undefined. If rustc can deduce that a snippet of code reads an uninitialized byte into a u8, then rustc reserves the right to "miscompile" that snippet by making optimizations which assume that the snippet is never executed at runtime.

2 Likes

Assuming you have fixed endianness and your structure has a fixed layout without any padding, then AFAIK casting &T to &[u8; mem::size_of::<T>()] is safe. Doing so in the reverse direction can not be safe due to the alignment issues, but you can use ptr::read_unaligned and if alignment of your buffer is sufficient, compiler may optimize out the copy. Of course, such operation will be safe only if any bit-pattern is valid for T. Another possible option is to cast &mut T to &mut [u8; N] and read data from socket/file into this array and use value after array will go out of scope.

So it's possible to do, but you must be really careful and knowledgeable about unsafe code restrictions.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.