Help understanding libc call

Hi,

I'm trying to call libc::recvfrom but I'm a little lost. By looking at other repos I've came up with two methods that compile, but I'm not sure if they are correct, and if not, why they are incorrect. Note that the code is imcomplete (the methods should return something, I should check the return value of recvfrom, etc), I just minimized it to the portion I'm struggling with.

    extern crate libc;
    use std::os::unix::io::RawFd;
    use std::ptr;
    
    fn main() {}
    
    pub struct Socket(RawFd);
    
    impl Socket {
        fn recv_from_simple(&self, buf: &mut [u8]) {
            let _ = unsafe {
                let addr: *mut libc::sockaddr = ptr::null_mut();
                let addrlen: *mut libc::socklen_t = ptr::null_mut();
                // I'm not 100% sure why this cast is safe.
                //
                // As far as I understand buf.as_mut_ptr() is a *mut u8, on and on 64 bits platform
                // lib::c_void is a *mut usize, so I guess this is fine as long as there's not pointer
                // arithmetic being done in recvfrom?
                let bufptr = buf.as_mut_ptr() as *mut libc::c_void;
                let buflen = buf.len() as libc::size_t;
    
                libc::recvfrom(self.0, bufptr, buflen, 0, addr, addrlen);
            };
            unimplemented!();
        }
    
        // This seems more complicated and I don't understand why we need the extra "&mut T as *mut _ as *mut _"
        // But that's how it's done in the rust repo: https://github.com/rust-lang/rust/blob/8ff4b42064b374bb62043f7729f84b6d979c7667/src/libstd/sys/unix/net.rs#L251
        fn recv_from_like_in_the_rust_repo(&self, buf: &mut [u8]) {
            let _ = unsafe {
                let mut addr: *mut libc::sockaddr = ptr::null_mut();
                let mut addrlen: *mut libc::socklen_t = ptr::null_mut();
                let mut bufptr = buf.as_mut_ptr() as *mut libc::c_void;
                let buflen = buf.len() as libc::size_t;
    
                libc::recvfrom(
                    self.0,
                    bufptr,
                    buflen,
                    0,
                    &mut addr as *mut _ as *mut _,
                    addrlen,
                );
            };
            unimplemented!();
        }
    }

The second one is kind of taken from the rust repo, where it looks like:

    fn recv_from_with_flags(&self, buf: &mut [u8], flags: c_int)
                            -> io::Result<(usize, SocketAddr)> {
        let mut storage: libc::sockaddr_storage = unsafe { mem::zeroed() };
        let mut addrlen = mem::size_of_val(&storage) as libc::socklen_t;

        let n = cvt(unsafe {
            libc::recvfrom(self.0.raw(),
                        buf.as_mut_ptr() as *mut c_void,
                        buf.len(),
                        flags,
                        &mut storage as *mut _ as *mut _,
                        &mut addrlen)
        })?;
        Ok((n as usize, sockaddr_to_addr(&storage, addrlen as usize)?))
    }

Could someone explain if my solutions work, and if not, why not? I could just copy/paste from the Rust repo but I'd like to understand why this works.

let addr: *mut libc::sockaddr = ptr::null_mut();

This is not what std does - it creates a zeroed sockaddr_storage value on the stack and passes a raw ptr to it to the libc call; recvfrom will then fill it out. If you pass it a null ptr, it won’t - different semantics.

&mut storage as *mut _ as *mut _

The first part creates a normal Rust mutable borrow; the first cast converts it to a raw ptr to sockaddr_storage. The second casts it to a raw ptr to sockaddr, which is what libc::recvfrom is expecting.

libc::c_void is not a *mut usize. It’s in fact an opaque enum with a repr u8.

To clarify –
*mut libc::c_void is equivalent to C's void *; it's just an opaque pointer type that could point to anything, and needs to be cast to another pointer type before being dereferenced. In this case it points directly to the packet data buffer, and the size of the buffer is passed as the next argument to recvfrom.

You might be mixing it up with the double pointer *mut *mut libc::c_void, which would be equivalent in some sense to *mut usize (since a pointer itself has the same size as usize). But that's not used here.

1 Like

recvfrom is difficult to use, even in C. Rather than use libc directly, I suggest you use the wrapper in nix:

1 Like
let addr: *mut libc::sockaddr = ptr::null_mut();

This is not what std does - it creates a zeroed sockaddr_storage value on the stack and passes a raw ptr to it to the libc call; recvfrom will then fill it out. If you pass it a null ptr, it won’t - different semantics.

Thanks, I had assumed "null pointer" and "pointer to zeroed memory" were the same thing... I feel dumb for that sorry :stuck_out_tongue:

&mut storage as *mut _ as *mut _

The first part creates a normal Rust mutable borrow; the first cast converts it to a raw ptr to sockaddr_storage. The second casts it to a raw ptr to sockaddr, which is what libc::recvfrom is expecting.

But then isn't that equivalent to a simple:

storage as *mut libc::sockaddr

Also (and this is not really a Rust question), if recvfrom takes a pointer sockaddr, how does it do to handle IPv6? I guess it internally cast the pointer back into a pointer to sockaddr_storage? It does not seem super safe to cast between types that have different sizes.

Thanks! Do you know why recvfrom takes a *void pointer? A buffer is an array of bytes, so why not take a *char directly?

storage is a value - you can't cast it to a ptr like that directly. You want a pointer to it.

This SO thread has decent coverage on this. Welcome to C APIs :slight_smile:.

1 Like

Ahh! It feels like the last piece of the puzzle just found its place.

Thank you!

Here is the corrected and annotated version!

extern crate libc;
use libc::{c_int, c_void, size_t, socklen_t, sockaddr, sockaddr_nl, recvfrom};
use std::os::unix::io::{RawFd};
use std::mem;

fn main() {}

pub struct Socket(RawFd);

impl Socket {
      fn recv_from(&self, buf: &mut [u8], flags: c_int) {
        let _ = unsafe {
            // Create an empty storage for the address. Note that Rust standard library create a
            // sockaddr_storage so that it works for any address family, but here, we already
            // know that we'll have a Netlink address, so we can create the appropriate storage.
            let mut addr = mem::zeroed::<sockaddr_nl>();

            // recvfrom takes a *sockaddr as parameter so that it can accept any kind of address
            // storage, so we need to create such a pointer for the sockaddr_nl we just
            // initialized.
            //
            //                  Create a raw pointer to     Cast our raw pointer to a 
            //                  our storage. We cannot      generic pointer to *sockaddr
            //                  pass it to recvfrom yet.    that recvfrom can use
            //                               ^                       ^
            //                               |                       |
            //                  +------------+-----------+    +------+-----+
            //                 /                          \  /              \
            let mut addr_ptr = &mut addr as *mut sockaddr_nl as *mut sockaddr;

            // Why do we need to pass the address length? recvfrom takes a generic *sockaddr as
            // argument, not a *sockaddr_nl, so cast our *sockaddr_nl into a *sockaddr when
            // calling recvfrom. However, somehow recvfrom needs to make sure that the address of the
            // received packet would fit into the actual type that is behind the pointer: there could
            // be a sockaddr_nl but also a sockaddr_in, a sockaddr_in6, or even the generic
            // sockaddr_storage that can store any address.
            let mut addrlen = mem::size_of_val(&addr);
            let mut addrlen_ptr = &mut addrlen as *mut usize as *mut socklen_t;

            //                     Cast the *mut u8 into *mut void.
            //                  This is equivalent to casting a *char into *void
            //             I'm not sure why recvfrom does not take a *char directly though
            //                                       ^
            //           Create a *mut u8            |
            //                   ^                   |
            //                   |                   |
            //             +-----+-----+    +--------+-------+
            //            /             \  /                  \
            let buf_ptr = buf.as_mut_ptr() as *mut c_void;
            let buf_len = buf.len() as size_t;

            recvfrom(self.0, buf_ptr, buf_len, flags, addr_ptr, addrlen_ptr)
        };
    }

}

I still have two questions, but they are about the C API so we're getting a bit off-topic, I hope it's ok to ask here.

  • Why does recvfrom takes the buffer as *void on not *char? A buffer is always an array of bytes, so it would make sense to make this explicit in the API.
  • Why does recvfrom takes the address length as a pointer instead of a value? The only reason would be if recvfrom was setting the address length, but we actually set that value, and all recvfrom needs to do is read it. Edit: well one situation where we would like recvfrom to set the address length based on the message is when we don't know whether we received a package from and IPv4 or an IPv6. But still, in that case, I think we could give a sockaddr_storage to recvfrom and then read the address family to find out the address size.

For your first question, I'm not sure of the specific motivation for void * vs char *, but I can take a guess: because some details of char are implementation-defined, and this is (in C) the way to define a completely generic "pointer to a blob of application defined data" parameter. char is defined to be 1 byte, but its signedness is implementation defined. This was a common source of issues porting code from x86 (where char is signed -- at least with gcc!) to powerpc (where char was unsigned). Better for the generic API to avoid making claims about the format of arbitrary binary data.

For your second question -- your edit is correct -- it's because the parameter is both an input and and output parameter:

     If address is not a null pointer and the socket is not connection-oriented, the source address of the message is filled in.  The address_len argument is a
     value-result argument, initialized to the size of the buffer associated with address, and modified on return to indicate the actual size of the address
     stored there.

I suppose, depending on the protocol, that the length of the address may not be deducible from its format alone, so an explicit length needs to be communicated to the caller.

1 Like

Why does recvfrom takes the buffer as *void on not *char? A buffer is always an array of bytes, so it would make sense to make this explicit in the API.

The thing is, at least traditionally, C kind of sees everything as an array of bytes. For example, to allocate an instance of struct foo on the heap, you have to write:

struct foo *my_foo = (struct foo *)malloc(sizeof(struct foo));

malloc takes a size in bytes, and returns void *, which you're expected to cast to the type you really want. It's a pretty awkward API, but it's the sort of the best you can do without using generics (which C doesn't have) or macros, and it's a sign of C's low-level roots.

(Actually, the above sample is slightly more verbose than strictly necessary, but hopefully it makes it clear what's going on.)

Similarly, if you want to heap-allocate an array, the most common way to do so is to manually multiply the element count by the size of each element:

size_t count = 1000;
int *my_int_array = (int *)malloc(count * sizeof(int));
my_int_array[999] = 12345;
// ...

Incidentally, this is horrible from a security perspective. If there's an integer overflow in the calculation count * sizeof(struct foo), you'll get a buffer that's too small; you can then end up going out of bounds if you write to the n'th element, even if n < count. And it's hard to check for this even if you're willing to pay the cost of a runtime sanitizer, because malloc(huge * huge) doesn't actually execute any undefined behavior: sizes are usually unsigned, and in C, unsigned integer overflow is perfectly legal and well-defined; it's only signed overflow which is undefined.

Anyway, for a third example, if you want to copy an array from one place to another, you could just do it element-by-element:

void copy_int_array(int *dest, int *source, size_t n) {
    for (size_t i = 0; i < n; i++) {
        dest[i] = source[i];
    }
}

…but a common approach in C is to use memcpy, which copies a specified number of bytes:

void copy_int_array(int *dest, int *source, size_t n) {
    memcpy(dest, source, n * sizeof(int));
}

memcpy, incidentally, has the signature

void *memcpy(void *restrict dst, const void *restrict src, size_t n);

Similar to recvfrom, it takes void * even though it treats the arguments as arrays of bytes.

You might say – "Sure, C likes to treat typed data as bytes, but recvfrom is different. Packet buffers are actually arrays of bytes; why not at least type those as char *?"

But that's not necessarily true. In C, it's very common to use native types to describe on-wire or on-disk formats:

struct some_packet_header {
    uint32_t a;
    uint32_t b;
};

struct some_packet_header header;
recvfrom(sock, &header, sizeof(header), /*... flags etc.*/);

// Use `header.a`, `header.b`

Note that in C, every pointer type implicitly coerces to void * and back, so this doesn't require an explicit cast, whereas it would require one if recvfrom's argument was declared as char *.

In theory, C struct layout is implementation-dependent, but it's not typically an issue in practice if you stick to fixed-size types (and there are compiler-specific ways to force packed layout).

This approach to reading binary data is less common in Rust, but it's possible there too, and it's typically more efficient than explicitly deserializing. The main problem (in both languages) is that accessing the struct fields will read them with the native endianness, so if you want the data layout to be independent of the machine architecture, you have to use functions like C's ntohl, which will swap between big-endian and native-endian (i.e. a no-op if you're on a big-endian machine, and a byte swap if you're on a little-endian machine).

Anyway… this has been a somewhat long post; I have no idea whether you already know some or all of what I wrote. But hopefully it provides context for the difference between char * and void *.

3 Likes