Help with enum of Box through FFI

Hello everyone!

I am having an issue understanding how to share an Enum, where some variants holds a Box, through FFI boundary.
The Enum looks like this:

#[derive(Debug)]
#[repr(C)]
pub enum SchemeType {
    Ip,
    Bytes,
    Int,
    Bool,
    Array(Box<SchemeType>),
    Map(Box<SchemeType>),
}

At the beginning I thought it would not possible to share this type with C directly, but then I found RFC 2195 for really tagged unions (can't put the link because of the 2 links limitation).
So after reading that RFC, I thought that the C type equivalent type would be something like:

typedef enum {
    SCHEME_TYPE_TAG_IP,
    SCHEME_TYPE_TAG_BYTES,
    SCHEME_TYPE_TAG_INT,
    SCHEME_TYPE_TAG_BOOL,
    SCHEME_TYPE_TAG_ARRAY,
    SCHEME_TYPE_TAG_MAP,
} scheme_type_tag_t;

typedef struct {
    scheme_type_tag_t tag;
    void *data;
} scheme_type_t;

You can find the code here: https://github.com/marmeladema/rust-ffi-enum-box/commit/8d614500edd8413fea27b7ff5cccd47f8f658687

This did not worked well, and crashed in the test functions with Illegal instruction on x86_64 but worked fine on aarch64.
I also read this issue #52976 but its still not clear at all if Box<T> can be represented directly with a pointer. Its currently not #[repr(transparent)] so I guess until it is, it cannot safely be done.

After trying different things, i figured that adding 8 bytes padding to the end of scheme_type_t worked, at least on x86_64: https://github.com/marmeladema/rust-ffi-enum-box/commit/2bc030e806ce8cf2beba8c852ed97e95485afd85 but this does not work at all on aarch64.

The question is then, how should I approach this issue to make it work on all architectures?

Thank you in advance for your help!

The only way I can see that reliable works with rust stable and so on is converting to other type.

Something like:

//Rust
const VARIANT1: u8 =  XYZ;
const VARIANT2: u8 =  XYZ;
#[repr(C)]
struct CSchemeType {
   variant: u8,
   data: CSchemeTypeData,
}

#[repr(C)]
union CSchemeTypeData {
...
}
1 Like

@Dushistov is right, if I recall correctly, rust enums (Even ones without any data in them) are not compatible with C enums, even if declared #[repr(C)]. Using a struct like @Dushistov showed is probably the best solution, and the respective C code becomes:

typedef union {
    //...
} CSchemeTypeData;

typedef struct {
    char variant;
    CSchemeTypeData data;
} CSchemeData;

I figured the best way might be to add a FFI-compatible CSchemeType wrapper, (storing a raw pointer instead of a Box<T> and convert back and forth from and to real Rust SchemeType type:

#[repr(C)]
pub enum CSchemeType {
    Ip,
    Bytes,
    Int,
    Bool,
    Array(*mut SchemeType),
    Map(*mut SchemeType),
}

impl From<CSchemeType> for SchemeType {
    fn from(ty: CSchemeType) -> Self {
        match ty {
            CSchemeType::Ip => SchemeType::Ip,
            CSchemeType::Bytes => SchemeType::Bytes,
            CSchemeType::Int => SchemeType::Int,
            CSchemeType::Bool => SchemeType::Bool,
            CSchemeType::Array(arr) => SchemeType::Array(unsafe { Box::from_raw(arr) }),
            CSchemeType::Map(map) => SchemeType::Map(unsafe { Box::from_raw(map) }),
        }
    }
}

I tried it here: https://github.com/marmeladema/rust-ffi-enum-box/commit/f83993ad9f90adae322666aafc5e82fb69edb8f8 and it seems to work on both x86_64 and aarch64.
But then i removed the 8 bytes padding that i had introduced before, because it should be not be needed anymore: https://github.com/marmeladema/rust-ffi-enum-box/commit/9b6d10acd37e8f7637d4e6523f76acaa01555a26 and it crashed again on x86_64 while working properly on aarch64.

Seems pretty obvious that C and rust are using different sizes for the discriminant.

C apparently has no way to specify the size of an enum from what I am reading, so you can't use a C enum for this.

typedef int32_t scheme_type_tag_t;

const scheme_type_tag_t SCHEME_TYPE_TAG_IP = 0;
const scheme_type_tag_t SCHEME_TYPE_TAG_BYTES = 1;
...
#[derive(Debug)]
#[repr(C, i32)]
pub enum SchemeType {
    Ip,
    Bytes,
    Int,
    Bool,
    Array(*mut SchemeType),
    Map(*mut SchemeType),
}

But #[repr(C)] enums are just dangerous in general. If the C code supplies an invalid value for the discriminant (e.g. compiled against a different version of your library with more variants), you have UB.

1 Like

Thank you for a your answers!

I get that enum with variants that hold data do not exist in C world but RFC 2195 https://github.com/rust-lang/rfcs/blob/master/text/2195-really-tagged-unions.md is supposed to give a compatibility layer using #[repr(C)]. So what you're saying is that this RFC has never been implemented or that there are actually bugs in the actual implementation?
How can I know the actual support for what is stated in this RFC?

This RFC was never implemented as far as I can tell.

Oh ok then! It makes a lot more sense if that's the case. Maybe it should be safer if Rust would give an error in case of a #[repr(C)] enum with data in variants?

1 Like

After digging a bit deeper, I'm not sure anymore. Someone more familiar with this RFC will need to comment on it's validity.

I am with @ExpHP on this one:

  • with a repr(C, DiscriminantIntegerBackingType) enum, you should be able to work with it from C with a tagged union:

  • // hack for `void` within a struct
    typedef uint8_t empty_t[0];
    #define empty ((empty_t) {})
    
    /* C equivalent to Rust's
        #[repr(C, u32)]
        enum SchemeType {
            Ip,
            Bytes,
            Int,
            Bool,
            Array(Box<SchemeType>),
            Map(Box<SchemeType>),
        }
    */
            
    typedef struct scheme_type scheme_type_t;
    
    typedef uint32_t scheme_type_tag_t;
    
    #define SCHEME_TYPE_TAG_IP    ((scheme_type_tag_t) 0)
    #define SCHEME_TYPE_TAG_BYTES ((scheme_type_tag_t) 1)
    #define SCHEME_TYPE_TAG_INT   ((scheme_type_tag_t) 2)
    #define SCHEME_TYPE_TAG_BOOL  ((scheme_type_tag_t) 3)
    #define SCHEME_TYPE_TAG_ARRAY ((scheme_type_tag_t) 4)
    #define SCHEME_TYPE_TAG_MAP   ((scheme_type_tag_t) 5)
    
    typedef union scheme_type_payload {
        empty_t          ip;
        empty_t          bytes;
        empty_t          int;
        empty_t          bool;
        scheme_type_t *  array;
        scheme_type_t *  map;
    } scheme_type_payload_t;
    
    struct scheme_type {
        scheme_type_tag_t      tag;
        scheme_type_payload_t  payload;
    }
    
    #define SCHEME_TYPE_IP ( \
        (scheme_type_t) { \
            .tag = SCHEME_TYPE_TAG_IP, \
            .payload = (scheme_type_payload_t) { .ip = empty } \
        } \
    )
    
    #define SCHEME_TYPE_BYTES ( \
        (scheme_type_t) { \
            .tag = SCHEME_TYPE_TAG_BYTES, \
            .payload = (scheme_type_payload_t) { .bytes = empty } \
        } \
    )
    
    #define SCHEME_TYPE_INT ( \
        (scheme_type_t) { \
            .tag = SCHEME_TYPE_TAG_INT, \
            .payload = (scheme_type_payload_t) { .int = empty } \
        } \
    )
    
    #define SCHEME_TYPE_BOOL ( \
        (scheme_type_t) { \
            .tag = SCHEME_TYPE_TAG_BOOL, \
            .payload = (scheme_type_payload_t) { .bool = empty } \
        } \
    )
    
    #define SCHEME_TYPE_ARRAY(ptr) ( \
        (scheme_type_t) { \
            .tag = SCHEME_TYPE_TAG_ARRAY, \
            .payload = (scheme_type_payload_t) { .array = (ptr) } \
        } \
    )
    
    #define SCHEME_TYPE_MAP(ptr) ( \
        (scheme_type_t) { \
            .tag = SCHEME_TYPE_TAG_MAP, \
            .payload = (scheme_type_payload_t) { .map = (ptr) } \
        } \
    )
    
    // + getters, etc.
    
  • defining the enum in such a C compatible fashion would only let me sleep at night if instead of a #[repr(C, u32)] enum we had a procedural macro (or a build.rs script) explicitly defining the Rust struct as the Rust translation of the C code above; and a possible build.rs script to generate this C code.

I am concerned though; I can't help but notice that #[repr(C)] was accepted on enums with data even as early as Rust 1.0, when the RFC clearly could not have been implemented. Together with the fact that I can't find a tracking issue, this makes it hard to tell whether it was implemented.

1 Like

Yes, I think I would currently prefer to rely on a (procedural) macro / build script to do the expected job for me. Here is a quick and dirty PoC:

repr_C_enum! {
    #[repr(C, u32)] as ListPayload for C_List
    pub
    enum List {
        Empty,
        Node { head: i32, tail: *const C_List },
    }
}

which expands to:

// private / internal enum
enum List {
    Empty,
    Node { head: i32, tail: *const C_List },
}

// === Exported to FFI ===

#[repr(C)]
pub struct C_List {
    pub tag: u32,
    pub payload: ListPayload,
}

#[repr(C)]
#[allow(bad_style)] // with a procedural macro we could get lowercase fields
pub union ListPayload {
    pub Empty: Empty,
    pub Node: Node,
}

// with a procedural macro we could manully count
#[allow(dead_code)]
#[repr(u32)]
enum Helper {
    Empty,
    Node,
}

#[allow(bad_style)]
pub const Empty: u32 = Helper::Empty as u32;

#[repr(C)]
#[derive(Clone, Copy)]
pub struct Empty {}

#[allow(bad_style)]
pub const Node: u32 = Helper::Node as u32;

#[repr(C)]
#[derive(Clone, Copy)]
pub struct Node {
    pub head: i32,
    pub tail: *const C_List,
}

which "leads" (I had to manually feed 0 and 1 rather than the Helper::* constans, but again, Helper is not needed within a procedural macro) to cbindgen generating :

#include <stdarg.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdlib.h>

// These names would obviously be mangled with the type name and a _tag suffix
// As it is, it currently breaks the following definitions xD
#define Empty 0
#define Node 1

typedef struct {
} Empty;

typedef struct {
  int32_t head;
  const C_List *tail;
} Node;

typedef union {
  Empty Empty;
  Node Node;
} ListPayload;

typedef struct {
  uint32_t tag;
  ListPayload payload;
} C_List;