Can bindgen map tagged C unions to Rust enums?

So C has unions which are not discriminated. But without a way to know what's in the union, it's not even useful for C developers. So they almost always provide some hint to indicate what's in the union. Sometimes that hint is contextual like in a protocol header. But sometimes they make a second struct with a discriminator. That looks a lot like a Rust enum. Since Rust unions are unsafe it'd be nice if bindgen could be told to treat specific C struct+union pairs as a single Rust enum.

C:

enum CTAG {
  A,
  B
} CTAG;

union CUNION {
  SOME_T a,
  SOME_OTHER_T b
} CUNION;

struct CSTRUCT {
  ctag discriminator,
  cunion data
} CSTRUCT;

Rust:

#[repr(u8, C)]
enum CSTRUCT {
  A(SOME_T),
  B(SOME_OTHER_T)
}

I gave a quick skim through the bindgen docs but I didn't find any obvious indication that this is already supported. Initially I thought it might be because Rust's enum's in-memory structure is not defined for the sake of FFI, but with some research I found that RFC 2195 actually defines #[repr(u8,C)]. So I guess it just comes down to the difficulty of recognizing the intention of the C developer? There are a lot of slight variations on the above pattern and I could see how even telling bindgen what to do could get complicated.

tl;dr - Should I blacklist and hand-create Rust enums, or can bindgen help me out?

1 Like

I'm also realizing that only fieldless enums in Rust can have specified discriminants. That throws a further wrench into my plan to use these for FFI. The ordinal value for CTAG::A needs to match CSTRUCT::A to be able to pass these back and forth to C.

I found this blog post about #[rustc_layout(debug)] useful for getting detailed information about the layout of Rust enums and was able to confirm that #[repr(u32,C)] can do what I want.

I'd feel uncomfortable with bindgen automatically converting a tagged union from C into a Rust enum. There's no way to guarantee that some C object with a particular tag will have actually initialized the corresponding union variant, so any code doing this sort of conversion should require the caller to explicitly invoke an unsafe conversion method.

Another issue you may run into is that the backing integer for a C enum isn't really defined. In practice it'll use the smallest integer type that can hold all the variants, but the spec doesn't stop a C compiler from using larger types.

An enum is only guaranteed to be large enough to hold int values. The compiler is free to choose the actual type used based on the enumeration constants defined so it can choose a smaller type if it can represent the values you define. If you need enumeration constants that don't fit into an int you will need to use compiler-specific extensions to do so.

-- What is the size of an enum in C?

That means your #[repr(u8, C)] may not be identical to the CSTRUCT definition. I think it'd work out on a little-endian machine because the equivalent layout (#[repr(C)] struct Layout { tag: u8, value: Union }) would have the right amount of padding to put the value in the right spot, and we should be able to read the tag without accidentally touching padding bytes (UB) because little-endian integers have their least significant digits in the first byte, but that doesn't inspire confidence and would probably fall over on big-endian machines.

The problem doesn't lie with Rust, it's the loose wording of the C standard.

4 Likes

Oh yeah, this is definitely not a problem with Rust. I was just hoping it'd be a problem for Rust. Your explanation makes sense. Upon closer reading of the Rust RFC, it is carefully worded. I guess I will have to just use a failable parse like TryFrom for now. Really makes you wonder how C ever worked in the first place.