Guarding against invalid enum values arriving via FFI

phg · October 11, 2019, 11:08am

When exposing Rust code to C through FFI I would like a runtime
check for valid enum values. E. g. a Rust enum

#[repr(C)]
enum Foo {
    Bar = 0,
    Baz = 42,
}

#[no_mangle]
fn call_me_from_c(foo: Foo) -> bool {
    match foo {
        Foo::Bar => whatevs(),
        Foo::Baz => dostuff(),
        _        => false,       /* can actually happen! */
    }
}

Which results in an unreachable code warning from Rustc:

warning: unreachable pattern
   --> src/client/lib.rs:288:13
    |
288 |             _ => return None
    |             ^
    |
    = note: #[warn(unreachable_patterns)] on by default

Of course I provide a header with C definitions for enum Foo etc.
and a description of the API contract. However I would like to
ensure that the catchall match arm above actually exists in the
resulting binary and is not optimized out.

What would be the most rustic approach here?

HadrienG · October 11, 2019, 12:01pm

Rust enums are not an appropriate tool for modeling C enums that are to be received as inputs, because they have to be exhaustive. Please model these as integer newtypes with associated constants.

(From this perspective, repr(C) Rust enums are arguably an FFI footgun and a language wart, but I guess they can make sense as long as the corresponding data type is only sent as output from the Rust code to the C code)

kornel · October 11, 2019, 12:25pm

You can't do this. Invalid enum value is Undefined Behavior, and it does cause dangerous misbehavior of the code. I've had match with UB value cause memory corruption and crash the program.

You have to model this enum as a set of constants (like bindgen does) or take a c_int and check if the value is valid before casting it to Rust enum.

phg · October 11, 2019, 12:57pm

Please model these as integer newtypes with associated constants.

That’s what I wanted to avoid though. Looks like there is no way
around it after all.

(From this perspective, repr(C) Rust enums are arguably an FFI
footgun and a language wart, but I guess they can make sense as
long as the corresponding data type is only sent as output from
the Rust code to the C code)

Where they can’t be used safely at all (e. g. const enum Foo
in function arguments) a compile time error would be more helpful
IMO.

You can't do this. Invalid enum value is Undefined Behavior,
and it does cause dangerous misbehavior of the code. I've had
match with UB value cause memory corruption and crash the
program.

You have to model this enum as a set of constants (like bindgen
does) or take a c_int and check if the value is valid before
casting it to Rust enum.

It would be nice if I could define an “open” enum like so:

#[repr(C)]
enum Foo {
    Bar = 0,
    Baz = 42,
    Invalid = _, /* any other integer not in the set of defined tags */
}

which would make Rustc insist on a default match clause and
generate the code for it.

HadrienG · October 11, 2019, 1:51pm

Note that the pain can be eased somewhat by automating the boilerplate using suitably designed macros (here's an example) or more sophisticated code generators like bindgen.

There have been various proposals of that sort before, ranging from providing dedicated syntax to disable rustc's enum exhaustiveness assumption (as you suggest later in your post) to deprecating repr(C) enums entirely.

But so far, these proposals were dismissed as overly strong and preventing valid if uncommon usage (e.g. when the C code is part of the same package as the Rust code and can be trusted not to use invalid enum values).

I personally proposed the compromise of having repr(C) enums be linted by clippy, since they are usually the wrong choice but not always. But even that was felt to be too aggressive against legitimate usage.

Overall, there seems to be some strong divergences of opinion about how big of a problem this is and how common valid usage is w.r.t. invalid one, and that bit must be resolved before this particular language design discussion can move forward.

Hyeonu · October 11, 2019, 2:05pm

In short call_me_from_c(1); in C code is UB, means it's likely to hint the compiler you don't care the correctness of this code path so the compiler will happily elide the whole codes in this path in the resulting binary to make faster program, ideally finished in single clock.

ZiCog · October 11, 2019, 2:18pm

Tangential anecdote:

Years ago I was involved in integration testing a Fly By Wire system. It was written in Ada. As you probably know Ada is very fussy about detecting out of bounds, overflow errors and such like. Like Rust, actually writing a program that compiles is hard but the results are far more predictable than other languages. Just what what you need for a safety critical avionics system.

I found a nice bug where the PFC would trip out when it read a sensor input I driven out of range. Those sensors in the real world don't care what valid ranges you have specified for your variables in Ada.

I see the same going on with this C enum / Rust enum problem. Rust should not trust what it is given and one should not blindly jam it into place with "unsafe". Read the input, however many bits it is, if it's a valid enum value then make one.

newpavlov · October 11, 2019, 3:31pm

I have encountered another relevant example when working with WASI Core API (but I guess this pattern is very common for C libs). It has types like this:

pub type __wasi_eventtype_t = u8;

pub struct __wasi_subscription_t {
    pub userdata: __wasi_userdata_t,
    pub type_: __wasi_eventtype_t,
    pub u: __wasi_subscription_u,
}

pub union __wasi_subscription_u {
    pub clock: __wasi_subscription_u_clock_t,
    pub fd_readwrite: __wasi_subscription_u_fd_readwrite_t,
    _bindgen_union_align: [u64; 5],
}

type_ filed contains the tag which is used for interpreting data stored inside __wasi_subscription_u. Right now users have to write unsafe code to work with such types, but I believe that ideally Rust should support writing code like this:

// this type should have exactly the same layout as combination of
// `type_` and `u` fields in the earlier snippet
#[repr(u8, C)]
#[non_exhaustive]
enum Subscription {
    Clock { .. } = EVENTTYPE_CLOCK, // Clock variant contains 5 u64 fields
    FdRead { fd: WasiFd } = EVENTTYPE_FD_READ,
    FdWrite { fd: WasiFd } = EVENTTYPE_FD_WRITE,
}

It would be safe to use, less verbose and much more convenient and idiomatic.

chrisd · October 11, 2019, 3:34pm

I think the keyword enum can be confusing when thinking about different languages because in C an enum is simply an int (or perhaps other integer type) with some globally defined constants.

So in C, an enum like this:

typedef enum {
    Bar = 0,
    Baz = 42,
} Foo;

Is roughly equivalent to:

typedef int Foo;
const int Bar = 0;
const int Baz = 42;

In C an enum is just an integer so that is what has to be used for C FFI. On the other hand a Rust enum is a distinct type with strict rules. This allows for improved static analysis of your code (to spot errors) and potentially allows more aggressive optimizations. Nothing in C maps directly to it. You probably get this but I wanted to be very clear about why this isn't trivial.

On a more philosophical note, I don't think it's helpful to try and force Rust enums to be something they are not solely to make C FFI more intuitive. In general I find fighting language features to be a frustrating experience and a losing battle.

Don't get me wrong. I'd love FFI to be a lot more painless in general, which may require better documentation, supporting crates and language features. But I believe we have to think very carefully before undermining the Rust type system for the sake of C compatibility. And honestly I dislike overloading the enum keyword to support fundamentally very different concepts unless there's no acceptable alternative.

tl;dr C integers and Rust enums are very very different types.

Yandros · October 11, 2019, 4:19pm

See the ::num_enum crate to automate the generation of the checked conversion:

use ::num_enum::TryFromPrimitive;

#[derive(Debug, TryFromPrimitive)]
#[repr(u8)] // an explicit primitive integer type is required
pub
enum Foo {
    Bar = 0,
    Baz = 42,
}

#[no_mangle]
pub
extern "C"
fn print_foo (foo: u8) // or foo: <Foo as TryFromPrimitive>::Primitive
{
    use ::core::convert::TryFrom;
    match Foo::try_from(foo) {
        | Ok(foo) => {
            let _: Foo = foo; // we have a value of the valid enum type
            println!("{:?}", foo);
        },

        | Err(err) => {
            eprintln!("print_foo() error: {}", err);
        },
    }
}

fn main ()
{
        print_foo(0);
        print_foo(1);
        print_foo(42);
}

outputs

Bar
print_foo() error: No discriminant in enum `Foo` matches the value `1`
Baz

ZiCog · October 11, 2019, 5:09pm

chrisd,

Here, here!

The other day I was watching a presentation from some Rust conference on YouTube, about embedded Rust, FFI and such. Sorry I forget who or what exactly. The presenter is on some embedded Rust working group or such so I was interested.

But he kept going on about "parity with C". Which seemed to be some notion of making Rust capable of doing anything that you can do in C. "Oh my God no", I thought, "how do we stop this guy?"

You see, at face value "parity with C" says to me that one should:

Bend, overload, Rust features so as to make them work like C.
Or start adding more features to the Rust language synatx/semantics to make it work like C.
In the extreme it would imply supporting all the UB and other dumb ass features of C.

Maybe I had the wrong end of the stick but I find all of above abhorrent. If you want C do it in C for goodness sake.

This discussion about how to "fix" exchanging enums with C seems to be a fine case in point. C enums are not anything like Rust enums. As far as I know the size of an enum in C is not even defined in the standard.

HadrienG · October 11, 2019, 5:23pm

Are you thinking about this presentation by the Rust lang team's new co-leader, perhaps?

ZiCog · October 11, 2019, 5:42pm

Yes, that presentation.

I have to watch again to check that "...how Intel is working to bring Rust to full parity with C" is not how I first first took it.

It sounds like big corporation lead feature creep. I hate feature creep. The ever growing complexity of a thing until it becomes unlearnable and usable. Cough, C++, cough CORBA....etc, etc. Hey, why not have a "class" keyword in Rust like the JS guys got? That will attract the Java refugees.

I hope I'm over reacting.

chrisd · October 11, 2019, 5:55pm

Hmm... I would want to draw a distinction depending on what's meant. I definitely agree that compatibility with C simply for the sake of compatibility would undermine the point of Rust.

However, if "parity with C" means being able to use Rust in more situations where C is currently used, then I am for it. But that can mean doing things differently to how C does them, even if the end result is the same. Which is fine, I think.

That said, I would reiterate that there are things Rust can do to make FFI easier without massively altering the language. E.g. more thorough FFI documentation, some standard FFI utility crates, extern types, etc.

ZiCog · October 11, 2019, 6:02pm

I will watch again and see if my interpretation changes.

"Parity with C" is just such an unfortunate way to put it. Normally people want parity with something better not worse!

I'm all for more documentation and crates to get the job done of course.

federicomenaquintero · October 14, 2019, 10:49pm

Gtk-rs auto-generates more or less this:

/* In the original C code, this is 
 * typedef enum
 * {
 *   GTK_BASELINE_POSITION_TOP,
 *   GTK_BASELINE_POSITION_CENTER,
 *   GTK_BASELINE_POSITION_BOTTOM
 * } GtkBaselinePosition;
 */
mod ffi {
    pub type GtkBaselinePosition = c_int;
    pub const GTK_BASELINE_POSITION_TOP: GtkBaselinePosition = 0;
    pub const GTK_BASELINE_POSITION_CENTER: GtkBaselinePosition = 1;
    pub const GTK_BASELINE_POSITION_BOTTOM: GtkBaselinePosition = 2;
}

mod gtk {
    use ffi;

    pub enum BaselinePosition {
        Top,
        Center,
        Bottom,
        #[doc(hidden)]
        __Unknown(i32),
    }

    impl ToGlib for BaselinePosition {
        type GlibType = ffi::GtkBaselinePosition;
    
        fn to_glib(&self) -> ffi::GtkBaselinePosition {
            match *self {
                BaselinePosition::Top => ffi::GTK_BASELINE_POSITION_TOP,
                BaselinePosition::Center => ffi::GTK_BASELINE_POSITION_CENTER,
                BaselinePosition::Bottom => ffi::GTK_BASELINE_POSITION_BOTTOM,
                BaselinePosition::__Unknown(value) => value,
            }
        }
    }
    
    impl FromGlib<ffi::GtkBaselinePosition> for BaselinePosition {
        fn from_glib(value: ffi::GtkBaselinePosition) -> Self {
            match value {
                0 => BaselinePosition::Top,
                1 => BaselinePosition::Center,
                2 => BaselinePosition::Bottom,
                value => BaselinePosition::__Unknown(value),
            }
        }
    }
}

ToGlib and FromGlib are just traits which mean "convert to a C type" and "convert from a C type".

scottmcm · October 15, 2019, 12:36am

Note that calling things from C with the wrong arguments is regularly UB; enums aren't that unusual here. For example, it's just as illegal to call foo((void*)1) if that function is fn foo(x: Box<u32>) on the Rust side.

Topic		Replies	Views
Undefined behaviour after unsafe enum usage	18	2043	January 12, 2023
Enum bounds in FFI? help	7	1136	January 12, 2023
Using C enum with FFI (question about enum size) help	11	3241	August 27, 2024
C-like enum's implementation for Rust help	10	1540	September 18, 2019
Examples of undefined behaviour in Rust	10	2068	January 12, 2023

Guarding against invalid enum values arriving via FFI

Related topics