Issue with parsing data in `__IncompleteArray` generated by `bindgen`

Bindings

I am generating bindings for ethtool.h.

#include <linux/ethtool.h>

Actual Results

I am getting erroneous results while trying to deal with __IncompleteArray.

This is the struct generated by bindgen:

pub struct ethtool_sset_info {
    pub cmd: __u32,
    pub reserved: __u32,
    pub sset_mask: __u64,
    pub data: __IncompleteArrayField<__u32>,
}

My code:

fn gsset_info(&self) -> Result<usize, Errno> {
    let mut sset_info = ethtool_sset_info {
        cmd: ETHTOOL_GSSET_INFO,
        reserved: 1,
        sset_mask: 1 << ETH_SS_STATS,
        data: __IncompleteArrayField::<u32>::new(),
    };

    // https://github.com/mmynk/rust-ethtool/blob/44189de0d9c6f79fd48a6c30e5a5d58d57342992/src/ethtool.rs#L42-L60
    match _ioctl(
        &self.sock_fd,
        self.if_name,
        &mut sset_info as *mut ethtool_sset_info as usize,
    ) {
        Ok(_) => {
            let data = unsafe { sset_info.data.as_ptr().read() } as usize;
            println!("length={}", data);
            Ok(data)
        },
        Err(errno) => Err(errno),
    }
}

It works where sset_info.data has some non-zero value but returns garbage when the value should be zero. The same code when I use this struct. I am guessing the issue is with how I'm reading the value sset_info.data.as_ptr().read() but I am not sure.

Similarly, struct generated by bindgen:

pub struct ethtool_gstrings {
    pub cmd: __u32,
    pub string_set: __u32,
    pub len: __u32,
    pub data: __IncompleteArrayField<__u8>,
}

My code:

fn gstrings(&self, length: usize) -> Result<Vec<String>, Error> {
    let mut gstrings = ethtool_gstrings {
        cmd: ETHTOOL_GSTRINGS,
        string_set: ETH_SS_STATS,
        len: length as u32,
        data: __IncompleteArrayField::<u8>::new(),
    };

    match _ioctl(
        &self.sock_fd,
        self.if_name,
        &mut gstrings as *mut ethtool_gstrings as usize,
    ) {
        Ok(_) => {
            let data = unsafe { gstrings.data.as_slice(length) };
            let some_data = unsafe { gstrings.data.as_ptr().read() };
            let data_ptr = gstrings.data.as_ptr();
            for i in 0..length {
                print!("debug: ");
                for j in 0..ETH_GSTRINGS_LEN {
                    print!("{} ", unsafe { data_ptr.add(i * ETH_GSTRINGS_LEN + j).read() });
                }
                println!();
            }
            // https://github.com/mmynk/rust-ethtool/blob/44189de0d9c6f79fd48a6c30e5a5d58d57342992/src/ethtool.rs#L62-L83
            return parse_names(data, length)
        },
        Err(errno) => Err(Error::GStringsReadError(errno)),
    }
}

The data array is quite different from what I'd expect while having expected values at certain positions. The data also has arbitrary null values which makes parsing it impossible. Again, the issue could be with how I'm reading the data but I'm not sure: let data = unsafe { gstrings.data.as_slice(length) };.

I also get (signal: 11, SIGSEGV: invalid memory reference) errors while trying to parse which could again indicate issue with my code.

Expected Results

Example of working code with custom structs: https://github.com/mmynk/rust-ethtool/blob/main/src/ethtool.rs

the __IncompleteArrayField is just a placeholder, it's there indicating the struct is dynamically sized, you cannot construct a struct with a trailing __IncompleteArrayField, you have to do some dirty pointer tricks. see Examples to use bindgen generated __IncompleteArrayField<u8> · Issue #1680 · rust-lang/rust-bindgen · GitHub

4 Likes

Thanks that helped. I had looked at it but assumed it doesn't fit into my use-case but I was wrong.

However, I'm stuck in another issue now which seems related to how I'm reading the bytes returned.
I have added these snippets to my code:

extern "C" fn new_ethtool_gstrings(length: usize) -> *mut ethtool_gstrings {
    let layout = Layout::from_size_align(
        mem::size_of::<ethtool_gstrings>(),
        mem::align_of::<ethtool_gstrings>(),
    ).unwrap();
    let value = unsafe { alloc(layout) };
    let mut values = Vec::<u8>::new();

    let cmd = ETHTOOL_GSTRINGS.to_ne_bytes();
    values.append(cmd.to_vec().as_mut());
    let string_set = ETH_SS_STATS.to_ne_bytes();
    values.append(string_set.to_vec().as_mut());
    let len = length.to_ne_bytes();
    values.append(len.to_vec().as_mut());
    let mut data = vec![0u8; length * ETH_GSTRINGS_LEN];
    values.append(&mut data);

    for i in 0..values.len() {
        unsafe {
            value.add(i).write(values[i]);
        }
    }
    value as *mut ethtool_gstrings
}

This struct pointer : *mut ethtool_gstrings is passed to another function and data field is supposed to be overwritten. I am then parsing it as follows:

let data = unsafe { gstrings.as_mut().unwrap().data.as_slice(length * ETH_GSTRINGS_LEN) };
println!("data: {:?}", data);  // this prints correctly (the data I expect)

// issue is here
data
    .chunks(ETH_GSTRINGS_LEN)
    .map(|chunk| {
        println!("debug: {:?}", chunk);
    }

The above correctly prints the first chunk, then panics with the below error:

 malloc.c:4302: _int_malloc: Assertion `(unsigned long) (size) >= (unsigned long) (nb)' failed.

I am guessing I need to somehow free some memory? But I'm not sure how.

Nah. Not freeing memory doesn't in itself cause errors. Leaks are perfectly safe. (Unless you leak so much that you exhaust memory, but that's still not going to corrupt the allocator's internal structures like the quoted error message indicates.)

You are still overwriting something you shouldn't be.

size_of only knows the statically known size, i.e. the “header” size, you need to calculate the runtime memory size required to hold the header PLUS the payload data which should co-locate right after the "header" part. it's the same trick used in C, just in C you use malloc.

1 Like

Thanks, @nerditation. I also needed to call mem::forget.

I have modified the code to something like below. It works, I hope this is an acceptable way to do it.

struct GStrings {
    length: usize,
    ptr: *mut ethtool_gstrings,
}

impl GStrings {
    /// Allocates memory for ethtool_gstrings struct and initializes it.
    fn new(length: usize) -> Self {
        let data_length = length * ETH_GSTRING_LEN as usize;
        let total_size = mem::size_of::<ethtool_gstrings>() + data_length;

        let mut value = Vec::<u8>::with_capacity(total_size);

        // Initialize the ethtool_gstrings struct
        let cmd = ETHTOOL_GSTRINGS.to_ne_bytes();
        let string_set = ETH_SS_STATS.to_ne_bytes();
        let len = length.to_ne_bytes();

        value.extend_from_slice(&cmd);
        value.extend_from_slice(&string_set);
        value.extend_from_slice(&len);

        // Initialize the data field with zeros
        value.extend(vec![0u8; data_length]);

        // Get a pointer to the allocated memory
        let ptr = value.as_mut_ptr() as *mut ethtool_gstrings;

        // Prevent the Rust Vec from deallocating the memory
        mem::forget(value);

        GStrings { length, ptr }
    }

    fn data(&self) -> &[u8] {
        unsafe {
            self.ptr
                .as_mut()
                .unwrap()
                .data
                .as_slice(self.length * ETH_GSTRINGS_LEN)
        }
    }
}

impl Drop for GStrings {
    fn drop(&mut self) {
        unsafe {
            let value = Vec::<u8>::from_raw_parts(
                self.ptr as *mut u8,
                mem::size_of::<ethtool_gstrings>(),
                mem::size_of::<ethtool_gstrings>(),
            );
            mem::drop(value);
        }
    }
}

There are still multiple issues with that code, some of which are severe:

  • You are not taking into consideration the potential padding in ethtool_gstrings when constructing it field-by-field, byte-by-byte (nor when calculating the total size).
  • You are allocating a Vec<u8> and casting its pointer type to *ethtool_gstrings, which results in a pointer that's potentially not sufficiently aligned. While that's technically not an immediate problem for a raw pointer, it can quickly turn into surprise-UB, eg. as soon as you convert it to a reference or simply dereference it. Even the ptr.write() calls are UB if your pointers are unaligned.
  • value.extend(vec![0u8; data_length]) allocates unnecessarily. You can just extend from a lazy iterator such as iter::repeat(0).take(data_length), or even do value.extend(total_size, 0).
  • value.as_mut_ptr() followed by mem::forget(value) is an anti-pattern, because it can mess with the provenance of the pointer (you are passing ownership the value by-value, hence pointers derived from it may no longer by considered valid, but AFAIK the exact rules are still unclear around this). There's an unstable method on Vec for correctly relinquishing ownership and converting to a raw pointer, .into_raw_parts(); on stable, you can use .into_boxed_slice().into_raw() instead.
  • In the destructor, you are dropping a vector constructed with the wrong length and the wrong capacity, which is immediate UB. You can't just wing it when it comes to the size of an allocation – you have to free it with the same size as it was allocated for.
2 Likes

Thanks a lot for your detailed comment, @H2CO3. I am trying to address these one-by-one.
I believe the following will give the correct size of the struct, is that right?

let data_length = length * ETH_GSTRING_LEN as usize;
let total_size = mem::size_of::<ethtool_gstrings>() + data_length * mem::size_of::<u8>();

Becaue mem::size_of for a struct should take care of the padding.

Here's the updated version of my code, have tried not to use Vec and be aware of the padding in general:

struct GStrings {
    length: usize,
    layout: alloc::Layout,
    ptr: *mut ethtool_gstrings,
}

impl GStrings {
    /// Allocates memory for ethtool_gstrings struct and initializes it.
    fn new(length: usize) -> Result<Self, EthtoolError> {
        // Calculate the size of the data field
        let data_length = length * ETH_GSTRINGS_LEN;

        // Calculate the layout with proper alignment
        let layout = alloc::Layout::from_size_align(
            // should this be: mem::size_of::<ethtool_gstrings>() + data_length * mem::size_of<u8>(),
            mem::size_of::<ethtool_gstrings>() + data_length,
            cmp::max(mem::align_of::<u32>(), mem::align_of::<u8>()),
        )
        .map_err(|err| EthtoolError::CStructError(err))?;

        // Allocate memory for the struct
        let gstrings_ptr = unsafe { alloc::alloc(layout) } as *mut ethtool_gstrings;

        // Initialize the fields of the struct
        unsafe {
            ptr::write(&mut (*gstrings_ptr).cmd, ETHTOOL_GSTRINGS);
            ptr::write(&mut (*gstrings_ptr).string_set, ETH_SS_STATS);
            ptr::write(&mut (*gstrings_ptr).len, length as u32);

            // Initialize the data field with zeros
            // should I instead do:
            // let zeros = iter::repeat(0).take(data_length); let zeros = &zeros[..];
            let zeros: &[u8] = &vec![0u8; data_length];
            ptr::copy_nonoverlapping(zeros.as_ptr(), (*gstrings_ptr).data.as_mut_ptr(), data_length);
        }

        Ok(GStrings {
            length,
            layout,
            ptr: gstrings_ptr
        })
    }

    fn data(&self) -> &[u8] {
        unsafe {
            self.ptr
                .as_ref()
                .unwrap()
                .data
                .as_slice(self.length * ETH_GSTRINGS_LEN)
        }
    }
}

impl Drop for GStrings {
    fn drop(&mut self) {
        unsafe {
            alloc::dealloc(self.ptr as *mut u8, self.layout);
        }
    }
}

That seems better, but now there are new issues in that code:

  • The calculation of the alignment is weird. What's that dance with u32 and u8? Why aren't you simply using the alignment of the header struct itself?
  • Creating a regular reference (not a raw pointer!) to uninitialized data is immediate UB, so you can't use references to write to the fields of the freshly allocated struct. You must use addr_of_mut!() and raw pointers for that.
  • Again, the zeros vector is allocated unnecessarily. I could find write_bytes in the std::ptr module's documentation without having had previous knowledge about it. You should look for these utility functions instead of allocating giant vectors of zeros or doing similarly unnecessary things.
1 Like

Thank you, @H2CO3. You have been super helpful, I am sorry for so many back-and-forth. I'm new to Rust (and even to some of these C concepts) and still learning a lot of things. I hope the following would be the final draft (fingers crossed), I have tried to address the comments, let's see:

struct GStrings {
    length: usize,
    layout: alloc::Layout,
    ptr: *mut ethtool_gstrings,
}

impl GStrings {
    fn new(length: usize) -> Result<Self, EthtoolError> {
        let data_length = length * ETH_GSTRINGS_LEN;

        // Calculate the layout with proper alignment based on the struct itself
        let layout = alloc::Layout::from_size_align(
            mem::size_of::<ethtool_gstrings>() + data_length * mem::size_of::<u8>(),
            mem::align_of::<ethtool_gstrings>(),
        )
        .map_err(|err| EthtoolError::CStructInitError(err))?;

        // Allocate memory for the struct
        let gstrings_ptr = unsafe { alloc::alloc(layout) } as *mut ethtool_gstrings;

        // Initialize the fields of the struct using raw pointers
        unsafe {
            let cmd_ptr = ptr::addr_of_mut!((*gstrings_ptr).cmd);
            let ss_ptr = ptr::addr_of_mut!((*gstrings_ptr).string_set);
            let len_ptr = ptr::addr_of_mut!((*gstrings_ptr).len);
            let data_ptr = ptr::addr_of_mut!((*gstrings_ptr).data);

            cmd_ptr.write(ETHTOOL_GSTRINGS);
            ss_ptr.write(ETH_SS_STATS);
            len_ptr.write(length as u32);
            // Initialize the data field with zeros
            ptr::write_bytes(data_ptr, 0u8, data_length);
        }

        Ok(GStrings {
            length,
            layout,
            ptr: gstrings_ptr,
        })
    }

    fn data(&self) -> Result<&[u8], EthtoolError> {
        unsafe {
            let value = self.ptr.as_ref().ok_or(EthtoolError::CStructReadError())?;
            Ok(value.data.as_slice(self.length * ETH_GSTRINGS_LEN))
        }
    }
}

impl Drop for GStrings {
    fn drop(&mut self) {
        unsafe {
            alloc::dealloc(self.ptr as *mut u8, self.layout);
        }
    }
}

I think you did address my observations; you should write some test cases and run the code through Miri to check if it finds any UB.

1 Like

Awesome, I will do that. Glad to see this message. :slight_smile:

Thanks a ton!!! Especially for guiding me through it and not just giving away the solution.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.