Generate aligned structs from repr(packed)

I'm reading in packed structs from binary files generated by C++ code. I cannot change the format of the structures on disk, but I have access to the .h files. I have figured out how to do this with #[repr(C, packed(1))] on my struct declarations (and I'm looking into bindgen to do this automatically from the .h files too).

But now the structures have multiple fields which are unaligned, and as shown in Issue #82523, you can't take references to unaligned fields. So this makes doing math and such on the actual data awkward. What I'd really like to do is run a macro (or something) to take a packed C-struct and generate a "regular" Rust struct from it. Preferably the From trait will be implemented both ways so it'll copy it back and forth. And have it work for the entire tree of types.

Something like this, but think a much larger tree of types:

#[repr(C, packed(1))]
struct SubStructRaw {
  first: [u8; 3],
  second: [u16, 5]
}

#[repr(C, packed(1))]
struct TopStructRaw {
  first: u8,
  nested_unaligned: SubStructRaw
}

// Make an aligned struct set: (makes TopStruct_Rusty, or whatever)
convert_to_rust_repr!(TopStructRaw)

Any way to do this? macro_attr looks promising, but I'm not experienced enough do utilize it, and I have to believe somebody has already solved this problem, I just can't find it.

You can read these fields using ptr::read_unaligned and addr_of! macro.

Perhaps you could generate getters and setters for this on these types, instead of generating new ones? Rust's derive macros should be able to do this.

1 Like

Sure, I can read_unaligned and such, but on a higher level: what's the damned point? What use is bindgen long-term if what's generated from it still require tons of manual steps to be used at all? You can't treat them like Rust types, you must convert them to really use the data there, but that isn't part of the generation for packed types? Half of it is automatic, great, you've halved my work... probably. But it's literally a half-done job if you can't get it into usable types (or out, back to the binary).

Now, is this whole thing another demonstration as to why standard or pseudo-standard serialization formats like json, yaml, etc, are good things so you can interchange between languages? Sure it is! But if I already have it in C (which exists in a lot of real-world circumstances, especially with embedded stuff), then it should be straightforward to getting it into a usable form, but it really isn't right now. And this kind of thing slows adoption IMO. Maybe I should invade the bindgen project and give them a piece of my mind. It feels like what they have is seriously half-measure. You get the data in, but then it's a PITA to use.

Sorry for this getting a bit rant-ish. I appreciate you trying @kornel . I just think I have to learn a lot more about Rust and macros to solve this long-term, and for others as well, because this isn't the 1st (or 2nd, or 5th) time that I've been brought into a project with a "fixed" binary format that nobody can change for "reasons" and you have to work with it.

I suspect it would be more productive to use a local fork of bindgen patched to generate the definitions you want. Generating the code should be easier in an external tool such as bindgen than in a macro. Then, of course, you can offer the patches to bindgen upstream in the usual manner, once you're satisfied with your solution and less likely to come in like your fire-breathing namesake. :wink:

3 Likes

You're running into an unfortunate crossing of multiple things:

  • Rust's safe references guarantee they're properly aligned. Unfortunately, in Rust it's too easy to access a struct field via a reference. Rust tried to ignore the problem, but alignment is exploited by the optimizer, so unaligned references really are UB. They are UB in C too, actually. There's -Waddress-of-packed-member in C.

  • C struct for packed field is a C-specific trick, and comes with its own headaches like endian-dependence. It's a bit alien in Rust. In Rust the convention would be to use something like serde + bincode to have a portable binary format with 1 line of code, or perhaps explicitly parse it with nom.

  • Rust isn't concerned with syntax sugar as much as with safety and locally explicit code. The problem of needing to write something twice is left to user macros.

  • Custom derive macros are an advanced feature.

I've noticed that there is a crate for this:

It's more focused on bitfields, but I think it can work for structs too.

I think it would also be doable with built-in macro_rules. If you wrote a macro that takes:

make_it_packed! {
   struct Foo { field: u8 }
}

then you could match on the fields, and generate struct definition and getters for each field.


macro_rules! make_it_packed {
    (struct $struct_name:ident { $($field_name:ident: $field_type:ty ),* $(,)? }) => {
        #[repr(C, packed)]
        #[derive(Copy, Clone)]
        struct $struct_name {
            $( $field_name: $field_type, )*
        }

        impl $struct_name {
            $(
                fn $field_name(&self) -> $field_type {
                    unsafe {
                        std::ptr::read_unaligned(std::ptr::addr_of!(self.$field_name))
                    }
                }
            )*
        }
    }
}

make_it_packed! {
    struct Foo {
        bar: i8,
        quz: u32,
    }
}

fn main() {
    let f = Foo { bar: 1, quz: 2 };
    assert_eq!(5, std::mem::size_of::<Foo>());
    assert_eq!(1, f.bar());
    assert_eq!(2, f.quz());
}
2 Likes

I think you should do self.$field_name instead of std::ptr::read_unaligned(std::ptr::addr_of!(self.$field_name)). The former will error on non-Copy types. The later will cause UB.

1 Like

I appreciate all the feedback here. I also raised it on the tracking issue for unaligned reference warnings, and got a similar macro-based suggestion there, though that one generated new structs that you could move them between (which is more like what I originally wanted). Unfortunately nobody has a "tree-based" solution to this yet (packed structs of packed structs) but that may be a "real effort" in the end anyways.

The one persistent insight here is about how this is UB in C as well, which is interesting, and probably true! This will be funny (not quite "haha", the other kind) if it ends up that this whole tree of packed stuff actually has this error in C, and that's a source of bugs. Completely possible with how the original project was written/defined, so it wouldn't surprise me.

In the end though, in a way this is more of a bindgen feature problem. If you generate structs, give the user a way to get the information out without getters and setters, otherwise, why have a struct at all? May as well be a higher-level data structure (map, etc) if you can't "use it like a struct" anyways. Hence generating a rusty-struct that you can go into() each other from.