Enforce non-zero struct size

I'm trying to print out the contents of structures that are from C. They're not null-terminated, and I need to deal with the possibility of any value in their fields and sanitize them before printing. So I made a function (using unsafe) to take a structure and print out any bytes that are alphanumeric:

use std::{slice, mem};
use bytemuck::{Pod, Zeroable};

#[derive(Clone, Copy, Debug)]
#[repr(C, packed(1))]
struct Tester {
    my_first: [u8; 3],
    my_second: u8,
}

unsafe impl Zeroable for Tester {}
unsafe impl Pod for Tester {}

fn is_alphanumeric(cur: u8) -> bool {
    let is_zero_to_9 = cur >= 48 && cur <= 57;
    let is_A_to_Z = cur >= 65 && cur <= 90;
    let is_a_to_z = cur >= 97 && cur <= 122;
    return is_zero_to_9 || is_A_to_Z || is_a_to_z;
}

fn pod_to_string<T: Pod>(input: &T) -> String {
    let buffer = unsafe {
        slice::from_raw_parts(
            std::ptr::addr_of!(*input) as *const u8,
            mem::size_of::<T>(),
        )
    };
    // Copy and sanitize
    let mut sanitized = [0; mem::size_of::<T>()];
    sanitized.copy_from_slice(buffer);
    sanitized.iter_mut().for_each(|x| {
        if !is_alphanumeric(*x) {
            *x = b"*"[0];
        }
    });
    let c_str =
        std::ffi::CString::new(sanitized).expect("Problem parsing POD data");
    format!("{}", c_str.to_str().unwrap())
}

fn main() {
    let test = Tester { my_first: [ 55, 50, 57], my_second: 50};
    println!("test: {}", pod_to_string(&test));
    let test = Tester { my_first: [ 13, 50, 100], my_second: 50};
    println!("test: {}", pod_to_string(&test));
}

I'm sure there's easier ways to do the alphanumeric and such, but the problem is this:

let mut sanitized = [0; mem::size_of::<T>()];.

I get the error "constant expression depends on a generic parameter". OK, I can understand that zero-size might be bad.

So how do I enforce that I want the size to be 1 or higher, to hopefully make this work? I'm OK with it being rejected on zero-sized structures, but I want it to work on other arbitrary structures.

I don't think this is an issue with zero-sized structs. In fact, if you replace both instances of mem::size_of::<T>() with 0, your code compiles and runs just fine.

This is really a const generics problem. The compiler would have to derive a constant value from a generic type parameter. It's similar to using an associated constant from a trait, which would fail with the same error.

I can see two immediate solutions. One is to use dynamic sizes and Vec<u8>. That's probably the simplest and most ergonomic. Alternatively, you can add a const generic parameter N for the size. But I don't see any easy way to guarantee mem::size_of::<T>() == N by construction. The code is also not very ergonomic.

Maybe someone else can come up with a better solution.

Edit:

Unrelated comment. I don't think addr_of! is necessary in your code. From the documentation:

Create a const raw pointer to a place, without creating an intermediate reference.

But you already have a reference, so it's unnecessary. Replacing it with input as *const T as *const u8 and running Miri yields no errors.

2 Likes

Are you sure you don’t just want to use bytemuck::bytes_of? It looks like that’s the basic operation you are performing at the top of the function and you already are using bytemuck::Pod

3 Likes

I don't think addr_of! is necessary in your code. snip But you already have a reference, so it's unnecessary. Replacing it with input as *const T as *const u8 and running Miri yields no errors

@bradleyharden - I appreciate this suggestion. It works, but I find the macro clearer as to what I'm doing. I'll definitely consider it though, as it may be more idiomatic. I've repeated to others before the old wisdom of "An engineer can write FORTRAN in any language" and I try and break out of that and write to the language when learning new ones, which is definitely the case for Rust for me right now.

@drewkett - yes, that's actually exactly what I should be using for the first part of this instead of my other unsafe usage. Thanks for pointing it out. But it doesn't help the 2nd part of my problem where I want to allocate an array so I can mess with the contents without affecting the original allocation.

And for both: I have since just allocated a Vec by doing let mut sanitized = Vec::from(buffer); but that's a workaround, and I wondered if there was a way to enforce non-zero sized structs. So the original problem has a workaround for what I'm doing, but still wondering about enforcing at compile-time more things about the structs I'm using.

Since you are using bytemuck anyway, there's no need for performing that unsafe cast yourself; the bytes_of() function does exactly that. I also don't understand what the point of copying the bytes into a different buffer is, nor what role OsString plays here. The re-implementation of the is_alphanumeric() function is superfluous, too – char has a matching method.

Your whole function can be rewritten with no unsafe, no copying, no conversion from/to OsString, using only a single allocation, and no unwrap()ping whatsoever:

fn pod_to_string<T: Pod>(input: &T) -> String {
    bytes_of(input)
        .iter()
        .copied()
        .map(char::from)
        .map(|c| if c.is_ascii_alphanumeric() { c } else { '*' })
        .collect()
}

(Playground)

2 Likes

Better yet, write a newtype wrapper for Pods that has a Display impl (and let the ToString blanket impl handle actual string conversion). This saves you from having to allocate at all unless you specifically need a String:

struct PodDisplay<'a, T>(&'a T);

impl<'a, T: Pod> fmt::Display for PodDisplay<'a, T> {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        use fmt::Write;
        
        for c in bytes_of(self.0).iter().copied().map(char::from) {
            f.write_char(if c.is_ascii_alphanumeric() { c } else { '*' })?
        }
        
        Ok(())
    }
}

(Playground)

1 Like

Much better and rustier than what I had @H2CO3 . Thank you very much. Not sure if the original answer, or the wrapper is what I'll do, but I like the options you've put out there.

My "original" question still stands about structures and stuff, but I like this better for what I'm actually trying to do. The structure stuff at compile-time was always something that I could work around, but still wanted to know.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.