Interfacing C code with bindgen: #define and types

If you care that much about portability being checked by the compiler (which is a very legitimate stance, don't get me wrong! I don't know where but I remember that the meta-level with #[cfg]s was one of Rust weak points, since it's hard to ensure lack of compiler errors or code coverage when code has been #[cfg]-elided by some sort of pre-processor), I do highly advise that you define transparent newtype wrappers:

mod strict {
    #[repr(transparent)]
    pub
    struct c_int(::std::os::raw::c_int);

    match_! {( i8, i16, i32, isize, i64, u8, u16, u32, usize, u64 ) {
        ( $($T:tt),* ) => ($(
            impl TryFrom<$T> for c_int {
                type Error = <$T as TryFrom<::std::os::raw::c_int>>::Error;

                fn try_from (x: $T)
                  -> Result<Self, Self::Error>
                {
                    x.try_into().map(Self)
                }
            }

            /* maybe add saturating and wrapping conversions as well */
        )*);
    }}
}
  • with match_! being a tool (ideally named match!) for ad-hoc macro invocations.

And then tell bindgen to use crate::strict:: as the prefix for the integer types.

Thanks to #[repr(transparent)], the ABI will be the same as that of the external C libs, and you'll have guaranteed type safety similar to that of usize never being the same as u32 nor u64.

The only annoyance will be that you'll have to use .try_into()s in many places (you won't have the "ergonomics" of as, although when you think about is, the footguns of as make it a rather bitter-sweet sugar…).


Also, I agree with the macro approach for the constants, that would be the most sensible translation of #defines, but the !() noise and, more generally, macros as a whole, don't get that much love in the ecosystem and are avoided when possible.

I don't fully understand your code above.

This is a newtype wrapper for c_int.
Is it intended to shadow std::os::raw::c_int?

What is match_!? You write:

What kind of tool? I don't understand.

Yes, I agree using as should be avoided due to problems/bugs at runtime on overflow. When I have a value where I don't know whether it will fit, I should use .try_into(). But in case of constants, that should rather be decided at compile time anyway, right?

Sorry, I wrote the post at a "conceptual" level trying not to delve too much into the details. The idea was to see a bunch of TryFrom implementations thanks to "macros" (in a wave-handed fashion), while also showcasing a potential syntactic improvement that has been on my mind for a while.

match_! can be written and implemented as:

macro_rules! match_ {(
    ( $($input:tt)* ) $rules:tt
) => (
    macro_rules! recurse $rules
    recurse! { $($input)* }
)}

and is basically a potential shorthand for doing things like:

macro_rules! try_from_impls {( $($T:tt),* $(,)? ) => ($(
    impl TryFrom...
)*)}
try_from_impls! { i8, i16, ..., u64 }
rationale

I've never been super fond of ordering the items like that (macro def before macro call), since macros are generally hard to read, and when defined in this order, the inputs to the macro are only known later on; which is information that would help the readability of the macro definition.

I thus prefer doing:

try_from_impls! { i8, i16, ..., u64 }
// where
macro_rules! try_from_impls {( $($T:tt),* $(,)?? => (
                            // ^^^^^^^^^^ 
                            // we know here that `$T` stands for `i8`, `i16`, etc.
    ...
)}
use try_from_impls;

but then we have a try_from_impls identifier being repeated three times for just one invocation of the macro. Hence the match_! shorthand.

aside

And if that were blessed to be a language construct (match! syntax), then it could also let itself to chaning macros / eager expansion, in a rather controlled fashion:

match! concat_idents!(foo, bar) {( $foobar:ident ) => (
    fn $foobar() { ... }
)}
// or
match! include_str!("...") {( $contents:tt ) => (
    my_proc_macro! { $contents } /* problem of proc-macro accessing the fs solved. */
)}

Yes.

It is intended to shadow that type w.r.t. bindgen's generated definitions:

  • Here is the relevant part of ::bindgen's API that allows to do that.

Basically you have, in C, an int type, and the question that you and/or bindgen asks is which type, in Rust, should represent it. And it turns out we are free to choose whichever we want, provided it have the same ABI as a C integer: same layout, and same calling convention.

By construction, ::std::os::raw::c_int or ::libc::c_int are both types which satisfy these conditions, but they have the huge caveat of being defined as type aliases w.r.t. one of the Rust primitive integer types (e.g., i32), which makes the conversion between such a type and these aliases too lenient / unchecked on most platforms, and fail to compile on the platforms where the alias points to something else.

  • That is, these are the same problems that we'd have if usize had been defined as:

    #[cfg(target_pointer_width = "32")]
    type usize = u32;
    
    #[cfg(target_pointer_width = "64")]
    type usize = u64;
    

    Luckily usize is a type that is always distinct from both u32 and u64, which makes code have to explicitly perform some kind of transformation between these (in this instance, often as overused as casts, or as preferred .try_into() checked casts, or, for more fine-grained control over the cast's behavior, third-party crates out there feature quite nifty APIs, such as ::az.

So the question then becomes: could we have a type

  • with the same ABI as C's int type (i.e., same ABI as ::std::os::raw::c_int),

  • but distinct from the primitive integer types?

Achieving the latter is trivial: just define a new type (e.g., a struct). And for the former, we just mark that newtype as being #[repr(transparent)] over a type with the desired ABI. Hence:

#[repr(transparent)]
pub struct strict_c_int(::std::os::raw::c_int);
  • Aside: feel free to slap #[derive(Debug, Clone, Copy, PartialOrd, Ord, PartialEq, Eq, Hash)] onto it

At that point, thanks to having an actually distinct type, we can use privacy to encapsulate its construction i.e., to only offer explicitly checked casts!

And finally, to reduce the boilerplate with cbindgen, as well as avoid the "redundancy" of strict::strict_c_int;, I chose to rename it the same as c_int.

  • But if you find yourself in a codebase already using the "fuzzy" c_int from stdlib, and this new one, then shadowing may not be that well suited (although you could always "unshadow" as needed: use strict::c_int as strict_c_int;).

  • I personally, on the contrary, love shadowing the ill-designed types with these more robust ones as often as possible, to reduce the chances of accidentally using a "lenient" c_int, and so on.

Yes, the missing part is the case of constants (or at least literals). This a typical problem in the Rust language, since we'd like the same for, for instance, &'static CStr (we'd like b"Hello, World!\0" to be somehow convertible to a &'static CStr without runtime errors, and b"Ill F\0rmed" to trigger a compilation error). Similar to NonZeroU64 and other similar types featuring a const fn construction, etc. etc.

There is currently no auto-generated tool for these things, although const fn can get you quite far.

Let's start with NonZeroI32::new(), for instance:

macro_rules! nz_i32 {( $N:expr $(,)? ) => (
    {
        const __N: ::core::primitive::i32 = $N;
        {
            use ::core::{prelude::v1::*, num::NonZeroI32};
            const NZ: NonZeroI32 = match NonZeroI32::new(__N) {
                | Some(nz) => nz,
                | None => {
                    #[forbid(const_err)]
                    let _panic = 0 / __N;
                    [][0]
                },
            };
            NZ
        }
    }
)} pub(in crate) use nz_i32;
  • Playground

  • The key idea is that through helper const blocks, we can ensure come const fn fallible computation (such as NonZero::new()) is computed at compile-time, so as to make the failures become always-caught compile-time errors rather than runtime failures that may dodge unit tests.

    Since a const block is cumbersome to write (until we get inline const { ... } blocks), and "compile-time panics" require getting creative (although we'll soon have proper compile-time panics!), the current solution to alleviate this pattern / feature a lightweight construction is to use macros.

Another options is to use a procedural macro (or a build.rs script, but that's awfully cumbersome to write!) to inspect the contents of a given literal (≠ a const), and only emitted an unchecked construction if no invalid input is detected.

  • But that is brittle (what if the given const is not a literal, but the name / path to another constant (e.g., N); what if the given const is the output of some other macro?), so it should be avoided when a const fn-based implementation can be featured.

A final solution, which does have the advantage of not requiring to think too hard about const fns nor proc-macros, is, for the case of literals, again, to generate an anonymous test (thus a doctest) that will eagerly check that the input does not cause failures. It does have the issue that the input will still be checked at runtime each time, but does have the advantage of guaranteed a lack of runtime panics should the doctests have been run before releasing the code.

macro_rules! doctested_nz_i32 {( $N:literal $(,)? ) => ({
    /// ```rust
    #[doc = stringify! {
        let _ =
            ::core::num::NonZeroI32::new($N)
                .unwrap_or_else(|| panic!(
                    "`doctested_nz_i32!({:?})` failed", $N,
                ))
        ;
    }]
    /// ```
    extern {}

    ::core::num::NonZeroI32::new($N).unwrap()
})} use doctested_nz_i32;

In the case of CStrings, there are several crates out there which feature this compiletime-checked & runtime-infallible CStr constants. I'll mention the case of

since it features a "modern" const-friendly implementation (feature-gated for MSRV reasons), as well as a historical proc-macro implementation

From all these examples it shouldn't be that difficult to feature similar macros for runtime-unfallible construction of strict::c_ints out of numeric constants (of any type, for what is worth):

#[macro_export]
macro_rules! const_c_int {( $N:expr $(,)? ) => ({
    #[forbid(const_err)]
    const __C_INT: $crate::strict::c_int = {
        const MIN: ::core::primitive::i128 = ::std::os::raw::c_int::MIN as _;
        const MAX: ::core::primitive::i128 = ::std::os::raw::c_int::MAX as _;
        match $N as i128 {
            ok @ MIN ..= MAX => unsafe { ::core::mem::transmute(ok as ::std::os::raw::c_int) },
            _ => [][0],
        }
    };
    __C_INT
})}

but then such a construction would feature the same portability issues as we initially had: a const_c_int!(42) passing is not that different from a let it: c_int = 42; : for a big enough constant, insome platforms both forms may pass compilation but on others it may fail.

So the right thing to do here would be to either remain open to the possibility that even literal numbers can have a fallible conversion (since, even if the input number is constant, the target size is not, portability-wise!), or to choose to support integers only up to a specific integer size which you know shall fit the choice of ::std::os::raw::c_int for all platforms (is that i16?).

The macro above can then be amended to use such lower bounds for MIN and MAX (rather than those of c_int). Assuming i16 is OK:

- const MIN: ::core::primitive::i128 = ::std::os::raw::c_int::MIN as _;
+ const MIN: ::core::primitive::i128 = ::core::primitive::i16::MIN as _;
  // etc.

Understood now. It was just a bit tough for me to digest all of that at one time :sweat_smile:

Basically you create a newtype for c_int (same name as but distinct from std::os::raw::c_int) with a bunch of TryFrom implementations for all primitive integer types.

To keep it concise, this is done via a macro (match_!) that invokes, for each $input, an inner macro (recurse!) with the same $rules which process each $input.

:woozy_face:

(feeling double wrapped and innerly recursed now)

Thanks for sharing this interesting macro concept though. I might use it.

Ah okay, so it isn't really shadowing, but explicitly telling bindgen which prefix to use.

Yes, I totally agree, and I think it's dangerous to define c_int as i16 or i32 (in contrast to usize, where it's deliberately not done), because it will make code that compiles without problem on one platform fail to compile on different platforms.

So actually the problem isn't just limited to #defined constants but a more general issue. However, IMHO it would still be wrong to assume a type strict::c_int for a #defined constant.

Perhaps this could be solved if we created a distinct type like c_literal that then somehow uses const fns for conversion to other types? Maybe that could be an alternative to my proposal to use macros for constants in C header files that are "defined" through #define. I understand too little about the implications yet to judge whether that's worth the effort (Edit: or even feasible).

Not sure if I understand the rest of your post already, but:

But the target size is known at compile-time. It is okay to encounter an error (at compile time!) when porting a program that relies on a certain value fitting in a (platform dependent) type. But what I would not like is:

  • reporting the error at runtime (when it could have been reported at compile time), or
  • causing errors (whether at compile time or runtime) just because a (non-failing) conversion wasn't made (because the programmer wasn't warned about it being necessary).

If the error happens because someone tries to fit a large constant into a c_int, then I see no problem if this causes trouble when trying to port the program. But in my originally viewed case, this also happens with small numbers like 1 or 2 or 3.

(But like I said, I might not have fully understood the second part of your post yet. Maybe you did mean failures at compile-time?)

AFAIK an int in C must be at least 16 bits wide, so yes to your question.

1 Like

Rethinking about that, there might be some scenarios where it might be better if there was a runtime error thrown when a literal is used in a place where it doesn't fit a particular C type (as the literal might only be executed in certain cases). But I don't think it makes sense in the general case.

I don't have opinions on which strategy is better, but I have written my post thinking that a checked runtime error (not an implicit-overflow-induced logic error: we both agree that that's the worst! :grinning_face_with_smiling_eyes:) was better than a compile error where the whole project would not compile. Hence what I mentioned.

But I can definitely see the rationale behind prefering a compile-error to a runtime error, in which case the const_c_int! that checked against ::std::os::raw::c_int at compile-time would be a more sensible default :+1:

The more difficult case is when a value is returned and I want to compare it to a #defined constant. This would be a very bad idea:

// bad, because casting can truncate
if ffi_mod::some_c_func() as u32 == ffi_mod::SOME_CONST {
}

So I have to write it on the right side:

if ffi_mod::some_c_func() == ffi_mod::SOME_CONST as _ {
}

The second code should be safe if some_c_func's return type is big enough to contain the expected constant return value (which seems to be a reasonable assumption for most APIs).

But when I go for the second approach, I cannot use match, but I have to use a real ugly syntax (x if x == SOMECONST as _):

fn main() {
    const MYCONST: u32 = 2;
    let i: std::os::raw::c_int = 2;
    match i {
        //x as _ => println!("MYCONST found"),
        x if x == MYCONST as _ => println!("MYCONST found"),
        _ => println!("other value")
    }
}

(Playground)

Now you could say: Don't use casts anyway. They are dangerous. Okay, let me try a clean approach with try_into:

fn main() {
    const MYCONST: u32 = 2;
    let i: std::os::raw::c_int = -1;
    match i.try_into().unwrap() {
        MYCONST => println!("MYCONST found"),
        _ => println!("other value")
    }
}

(Playground)

But the code above will create a runtime error!

How if I go for:

fn main() {
    const MYCONST: u32 = 2;
    let i: std::os::raw::c_int = -1;
    match i {
        x if x == MYCONST.try_into().unwrap() => println!("MYCONST found"),
        _ => println!("other value")
    }
}

(Playground)

This seems to work. I assume the last approach is what I should do? It should never panic as long as we can assume that the constant fits into the type of the tested variable (which is a reasonable assumption, as mentioned above). But under that assumption, I could also just write x if x == MYCONST as _, which is at least a bit shorter.

So I'm not really sure what to do. match will force me to use an ugly x if x == … syntax, whatever I do. Or is there any other way?

If MYCONST was a macro, things would be much more easy:

macro_rules! MYCONST { () => { 2 }; }

fn main() {
    let i: std::os::raw::c_int = 2;
    match i {
        MYCONST!() => println!("MYCONST found"),
        _ => println!("other value")
    }
}

(Playground)

I hope this explains better why I want #defined constants being emitted as macros by bindgen.

How do other people deal with these issues?

  • Do you ever use match statements in your FFI code?
  • How do you deal with type mismatches, given the current behavior of bindgen?
1 Like

Regarding that, I think the correct code would be:

    const MY_CONST: u32 = 2;
    let i: std::os::raw::c_int = -1;
-   match i.try_into().unwrap() {
+   match i.try_into() {
-       MY_CONST => println!("MY_CONST found"),
+       Ok(MY_CONST) => println!("MY_CONST found"),
        _ => println!("other value")
    }

Since the idea is that if, mathematically, i has the same numeric value as MY_CONST, then that i instance must fit / be losslessly convertible to the type of MY_CONST. The contraposition thus justifies that second match arm :slightly_smiling_face:.

  • (I don't disagree with the rest of your point, though, about the usefulness of transposing #defines to Rust macros)
1 Like

Very nice idea. After reading your post, I was considering to use that in my code. Then I realized this only works if all constants have the same type. The following example will not compile:

fn main() {
    const MYCONST: u32 = 2;
    const OTHERCONST: i32 = -1;
    let i: std::os::raw::c_int = -1;
    match i.try_into() {
        Ok(MYCONST) => println!("MYCONST found"),
        Ok(OTHERCONST) => println!("OTHERCONST found"),
        _ => println!("other value")
    }
}

(Playground)

Errors:

   Compiling playground v0.0.1 (/playground)
error[E0308]: mismatched types
 --> src/main.rs:7:12
  |
3 |     const OTHERCONST: i32 = -1;
  |     --------------------------- constant defined here
4 |     let i: std::os::raw::c_int = -1;
5 |     match i.try_into() {
  |           ------------ this expression has type `Result<u32, TryFromIntError>`
6 |         Ok(MYCONST) => println!("MYCONST found"),
7 |         Ok(OTHERCONST) => println!("OTHERCONST found"),
  |            ^^^^^^^^^^
  |            |
  |            expected `u32`, found `i32`
  |            `OTHERCONST` is interpreted as a constant, not a new binding
  |            help: introduce a new binding instead: `other_otherconst`

For more information about this error, try `rustc --explain E0308`.
error: could not compile `playground` due to previous error

Note that bindgen by default will mix i32 and u32 (and use u64 or i64 where needed):

/* automatically generated by rust-bindgen 0.59.1 */
/* […] */
pub const __SCHAR_MAX: u32 = 127;
pub const __SCHAR_MIN: i32 = -128;
/* […] */
pub const __ULONG_MAX: i32 = -1;
pub const __LONG_MAX: u64 = 9223372036854775807;
pub const __LONG_MIN: i64 = -9223372036854775808;
/* […] */

I could force bindgen to always use u64, but then i would have an issue with negative numbers. Or I force it to use i64, but then I have a problem with values greater than 2^63-1. Use i128 to the rescue? :see_no_evil:

Is there some way to write a macro (as a workaround) that converts a constant into a literal? Such that I could write:

match status {
    literal!(MYCONST) => /* … */
    /* … */
}

I thought on something like

macro_rules! literal {
    ($x:expr) => {
        if $x == 0 { 0 }
        else if $x == 1 { 1 }
        else if $x == -1 { -1 }
        else if $x == 2 { 2 }
        else if $x == -2 { -2 }
        else if $x == 3 { 3 }
        else if $x == -3 { -3 }
        else if $x == 4 { 4 }
        else if $x == -4 { -4 }
        /* ... */
    }
}

The above macro wont work, as it would actually insert the if-else-clauses into the match pattern. I assume I'd need a procedural macro here, and not even sure if it's possible (and feasible)?

You could have:

macro_rules! switch {(
    $scrutinee:expr, {
    $(
        $(|)? $($const:ident)::+ => $rhs:expr,
    )*
        $(|)? _ => $default:expr $(,)?
    }
) => (
    match $scrutinee {
    $(
        | scrutinee if scrutinee.try_into() == ::core::result::Result::Ok($($const)::+) => $rhs,
    )*
        | _ => $default,
    }
)} pub(in crate) use switch;

so as to:

switch!(i, {
    MYCONST => println!("MYCONST found"),
    OTHERCONST => println!("OTHERCONST found"),
    _ => println!("Neither"),
})
1 Like

I assume writing ::core::convert::TryInto::try_into(scrutinee) instead of scrutinee.try_into() is better for macro hygiene?

Just out of curiosity (not that I would actually need this), is it possible to extend the $($const:ident)::+ such that absolute paths (beginning with ::) are supported as well?

:100:

It is, and that's the one I generally use, but:

  • I didn't wanna pollute the thread too much;

  • Not that many people use absolute paths (I'm actually pleasantly surprised that somebody brought it up!)

The solution would thus be:

  1. to write $(::)? $($segment:ident)::+, except that we need a name / a capture inside that leading $(::)? if we want to be able to emit it.

  2. we thus use the :empty capture groups which are designed precisely for this purpose:

    $($leading:empty ::)? $($segment:ident)::+
    

    except those don't exist :sob:

  3. So we use $(@ $_:tt)? as a hack to polyfill the :empty matcher:

    $($(@$leading:tt)? ::)? $($segment:ident)::+
    

Bonus: I was just complaining about yet another disappointment of :paths (in this instance, that they're ambiguous with _), and came up with a bunch of memes about it:

image

  • (there is only one instance where using $:path is warranted: to refer to a trait with its generic parameters already fed —since to refer to a type with its generic parameters already fed we have the more general $:type. I'd personally advocate for, in a future edition, renaming $:path to $:trait, and offering a $:simple_path renamed as $:path that would have the semantics of the ::-delimited repetition of identifiers :woman_shrugging:)
1 Like

Okay, again a lot to digest, but I tried to put it together: Playground.

Some things that aren't clear to me yet:

  • Why exactly doesn't $_:path work?

I looked into the definition of what a TypePath is:

TypePath :
::? TypePathSegment ( ::* TypePathSegment* )*

TypePathSegment :
PathIdentSegment ::? ( GenericArgs | TypePathFn )?

TypePathFn :
( TypePathFnInputs ? ) ( -> Type )?

TypePathFnInputs :
Type ( , Type )* , ?

And a PathIdentSegment is:

PathIdentSegment :
IDENTIFIER | super | self | Self | crate | $crate

Isn't that exactly what I need? When I naively try to use $_:path (Playground), I get:

error: local ambiguity when calling macro `switch`: multiple parsing options: built-in NTs path ('r#const') or 1 other option.
  --> src/main.rs:10:13
   |
10 |             _ => println!("Neither"),
   |             ^

I assume this problem is because of the optional

::? ( GenericArgs | TypePathFn )?

which leads to an ambiguity during parsing? (Just a guess, I don't really understand this yet.)

  • Is it correct that the @ is just a random character that is (hopefully) not matched?
  • Can I omit the whitespace between ? and ::

So if I understand this right, $_:path isn't well suitable in macros (except for some particular cases, but not to match a scoped identifier in general).

Now the memes make sense to me. :sweat_smile:

Yep, that's yet another disappointing thing about $:path, hence:

just expect :path to be unusable / a trap, act as if it didn't exist and you'll be better off it. In this instance, the issue is that the parser conservatively refuses to consider that _ cannot be a valid start of a :path, hence the ambiguity error…

:100:

You can, although I'd advise against doing that: highly sigil-heavy code such as Rust macros can quickly become very-hardly-readable sigil soup if one doesn't whitespace enough. Remember: you can add as much whitespace as you want between Rust elements:

fn foo (
    & v : & '_ :: core :: option :: Option < i32 > ,
) -> // only combined sigils, such as `::`, `->`,`=>`, `+=`, `..`, _etc._, are (inner-)whitespace-sensitive
    ()
{}

It may be a bit annoying to read source macro code written like that, but it's a "flat" difficulty with macros that one can get used to.

Compare that to $([$(($(…),* $(,)?)),* $(,)?])* which does not scale well at all: the chances of getting a typo in there tend to 1 as the level of nesting slowly increases. Heck, I can't even be sure myself, and I've just written it. Let me copy-paste it and add whitespace:

$([$(($(…),* $(,)?)),* $(,)?])*

$(
    [
        $(
            (
                $( … ),* $(,)?
            )
        ),* $(,)?
    ]
)*

Much better.

It would have been suitable to handle a const name, but for its ambiguity with _. And for that, we already have :pat as a pattern to match against. For an evaluated constant, we also have :expr.

Similarly, it can be used for something like ::anyhow::Result<()> (note the generics), but, then again, another matcher would have been better: $:ty (to e.g., support (), or <Ty as Trait>::Assoc).

So, as I mentioned, the only use case of :path that no other matcher is able to replicate is trying to take a potentially fully qualified (and with generics) trait "by name". So it should have been named :trait, restricted to be used with traits, and be done with it.

In any case, one typical pain point when writing macros is to go and write:

macro_rules! feed {(
    $T:ty => $GenericTy:path $(,)?
) => (
    $GenericTy<$T>
)}

so that feed!( () => ::anyhow::Result ) would work, but that doesn't work since a path cannot be fed generics (path segments can, so as to form a path, but then the path as a whole cannot: once a macro_rules! macro captures something as a matcher / metavariable / transcriber other than $:tt or $:ident, it becomes an "opaque token" / it gets to be wrapped in "invisible parenthesis" that make the parser be unable to consider its individual :tts anymore).

I think that having a $:simple_path as an alias for:

$($leading:empty ::)? $( $segment:ident )::+

would have lead to a way smoother macro experience:

  • Not ambiguous with _;

  • Can be followed by : (≠ ::), (…), +, and ! (e.g., $some_macro:simple_path ! would Just Work™)

  • Could be treated transparently so as to be allowed to be fed generics afterwards (or the grammar could be adjusted for that);

  • would not accept inputs such as foo() (yes, foo(x, y) is currently a valid $:path since it could be an alias for one of the Fn… traits).

1 Like

I really enjoyed this whole discussion, and especially this suggestion. It's a clever solution to one of the nastiest impedance mismatches between C and Rust.

But there are still plenty of others: inline functions, function-like macros, etc. The problem of a local build where the user has provided extra CCFLAGS that changed the configuration of how a C library was built in a way that Rust has no way to discover, that problem is probably impossible to solve.

To my mind, we just have to accept that bindgen-generated bindings are going to be imperfect. The more you labor over them, the better they get. But where is the line between a *-sys crate and its high-level companion, as described in the Cargo book?

It is common to have a companion package without the -sys suffix that provides a safe, high-level abstractions on top of the sys package. For example, the git2 crate provides a high-level interface to the libgit2-sys crate.

I've been tinkering with high-level bindings to Apple Core Audio for a while. Do I really have to be constrained by the design choices made by coreaudio-sys? It's a fine crate, and I'm using it heavily for unit testing, but there are certain aspects of it which don't work for my use case.

For instance, I would like to use newtype wrappers for certain constructs (bitflags, carefully curated integer repr Rust enums guarded with try_into, transparent repr newtype structs around pointers to add semantics) but the struct declarations made in coreaudio-sys know nothing about my new types — they only use raw pointers and integer types. And so I've accepted that the bindings I'm working on won't be compatible with the corresponding *-sys library.

The dedication I see you applying here suggests to me that you are really itching to write comprehensive high-level bindings for Lua — surely an ambitious undertaking, and maybe not something you want to sign up for now. The thing is, I question whether you'll be able to write a declaration-only *-sys library to a level you're happy with until you've actually done most of the design work on a higher-level library and fully explored the project requirements for those *-sys declarations.

Can you accept a *-sys crate that has imperfections, such as being broken on some fraction of platforms, or missing functionality? If not, how much work are you willing to put in working around the limitations of a bindgen-based approach, as you edge towards writing comprehensive high-level bindings?

There's nothing stopping you from writing a *-sys crate by hand and writing all the declarations yourself that bindgen would have made. Consider what bindgen would have done as inspiration, or a point of departure.

I particularly had problems with function-like macros too. They just won't be emitted by bindgen, and Lua uses them a lot. These are the function-like macros in the main lua.h of Lua 5.4 (there are some more compatibility macros and more macros in the auxiliary lauxlib.h):

#define lua_upvalueindex(i)     (LUA_REGISTRYINDEX - (i))
#define lua_call(L,n,r)         lua_callk(L, (n), (r), 0, NULL)
#define lua_pcall(L,n,r,f)      lua_pcallk(L, (n), (r), (f), 0, NULL)
#define lua_yield(L,n)          lua_yieldk(L, (n), 0, NULL)
#define lua_getextraspace(L)    ((void *)((char *)(L) - LUA_EXTRASPACE))
#define lua_tonumber(L,i)       lua_tonumberx(L,(i),NULL)
#define lua_tointeger(L,i)      lua_tointegerx(L,(i),NULL)
#define lua_pop(L,n)            lua_settop(L, -(n)-1)
#define lua_newtable(L)         lua_createtable(L, 0, 0)
#define lua_register(L,n,f)     (lua_pushcfunction(L, (f)), lua_setglobal(L, (n)))
#define lua_pushcfunction(L,f)  lua_pushcclosure(L, (f), 0)
#define lua_isfunction(L,n)     (lua_type(L, (n)) == LUA_TFUNCTION)
#define lua_istable(L,n)        (lua_type(L, (n)) == LUA_TTABLE)
#define lua_islightuserdata(L,n) (lua_type(L, (n)) == LUA_TLIGHTUSERDATA)
#define lua_isnil(L,n)          (lua_type(L, (n)) == LUA_TNIL)
#define lua_isboolean(L,n)      (lua_type(L, (n)) == LUA_TBOOLEAN)
#define lua_isthread(L,n)       (lua_type(L, (n)) == LUA_TTHREAD)
#define lua_isnone(L,n)         (lua_type(L, (n)) == LUA_TNONE)
#define lua_isnoneornil(L, n)   (lua_type(L, (n)) <= 0)
#define lua_pushliteral(L, s)   lua_pushstring(L, "" s)
#define lua_pushglobaltable(L)  \
        ((void)lua_rawgeti(L, LUA_REGISTRYINDEX, LUA_RIDX_GLOBALS))
#define lua_tostring(L,i)       lua_tolstring(L, (i), NULL)
#define lua_insert(L,idx)       lua_rotate(L, (idx), 1)
#define lua_remove(L,idx)       (lua_rotate(L, (idx), -1), lua_pop(L, 1))
#define lua_replace(L,idx)      (lua_copy(L, -1, (idx)), lua_pop(L, 1))

I didn't write or use a -sys crate, so what I did for now is to make two modules: one created by bindgen, and one extension module which contains those features which bindgen fails to provide and which I really need (note that even basic operations such as lua_pop or lua_call are defined as macros by Lua).

My solution to just fix things where needed isn't particularly beautiful, but it seems to avoid some work, at least.

You are partially right with your guess: I want to write a (very) high-level library, but it won't be comprehensive. My particular goal is making a crate, which allows execution of scripts in a secure sandbox (while providing protection against memory exhaustion, non-terminating scripts, etc). I likely won't support all of Lua's features for two reasons:

  • Some features of Lua require extra effort to be made safe in a sandbox. For example, if I include support for Lua coroutines, I'll have to provide an error-handling interface for the sandboxed Lua code. That is problematic because I use Lua errors to abort a machine which has exhausted its assigned execution instruction count. I'd have to write wrappers which make it impossible do not allow to catch and ignore these errors.
  • My crate is intended to support more scripting languages eventually, not just Lua. So I currently focus on an interface that may be of use for other languages as well and won't be very Lua-specific. Of course, I still can provide features which only work when you use Lua as a scripting language, but that's not my main focus for now.

Making the sandboxing secure is a non-trivial task, even if – at first glance – Lua seems easy to be sandboxed. But the tricky parts are:

  • limiting the memory acquired by the Lua machine,
  • terminating the Lua machine after a set amount of instructions executed,
  • providing an API that allows injecting callbacks into the machine (see also my other thread "Multiple closures with mutable references to the same variable?", where I try to convert Rust closures to Lua closures),
  • dealing with Lua API C functions, which cause a longjmp :scream: on error (or memory exhaustion, which may likely happen because of limited memory, see first point), e.g. because they trigger garbage collection within the Lua machine, which may execute user-provided destructors that throw a Lua error.

To work around these problems, I need to write some wrappers for some of the functions in the Lua API. In some way, these are pretty low-level, but they are also specific to my problem of sandboxing.

I feel like having an "exhaustive" Lua low-level -sys crate won't help me much. And writing one would be a lot of work that doesn't pay off. At least that's my impression for now. Instead, I will focus on making those functions work (safely) which I need, and on providing a very abstract and clean interface to the users of my crate.

I hope that explained my motivation and goals a bit.

1 Like

Crates based on bindgen ordinarily build according to headers they find on the target machine. Considering the purpose of your crate, would the uncertainty that introduces cause problems? How are you binding to lua?

I took a long time to figure out how to do it. This is what I (currently) ended up with:

Cargo.toml:

# …

[features]
default = ["Lua-5.4"]
Lua = []
"Lua-5.4" = ["Lua"]

[dependencies]

[build-dependencies]
cc = "1.0"
bindgen = "0.59"

build.rs:

fn get_env(key: &str) -> Option<String> {
    println!("cargo:rerun-if-env-changed={}", key);
    match std::env::var(key) {
        Ok(val) => Some(val),
        Err(std::env::VarError::NotPresent) => None,
        Err(err) => panic!("could not get env \"{}\": {}", key, err),
    }
}

fn get_env_default<'a>(key: &'_ str, default: &'a str) -> std::borrow::Cow<'a, str> {
    match get_env(key) {
        Some(val) => std::borrow::Cow::Owned(val),
        None => std::borrow::Cow::Borrowed(default),
    }
}

fn main() {
    if cfg!(feature = "Lua") {
        // get configuration from environment
        let lua_include = get_env("LUA_INCLUDE");
        let lua_lib = get_env("LUA_LIB");
        let lua_libname = get_env_default("LUA_LIBNAME", "lua");

        // create automatic bindings
        {
            let mut builder = bindgen::Builder::default();
            if let Some(dir) = &lua_include {
                let mut option = "-I".to_string();
                option.push_str(dir);
                builder = builder.clang_arg(option);
            }
            builder = builder.header("src/lua/cmach.c");
            builder = builder.parse_callbacks(Box::new(bindgen::CargoCallbacks));
            //builder = builder.default_macro_constant_type(bindgen::MacroTypeVariation::Signed);
            let bindings = builder.generate().expect("unable to generate bindings");
            let out_path = std::path::PathBuf::from(std::env::var("OUT_DIR").unwrap());
            bindings
                .write_to_file(out_path.join("ffi_cmach.rs"))
                .expect("unable to write bindings");
        }

        // build own C lib
        {
            println!("cargo:rerun-if-changed=src/lua/cmach.c");
            let mut config = cc::Build::new();
            if let Some(dir) = &lua_include {
                config.include(dir);
            }
            config.file("src/lua/cmach.c");
            config.compile("libffi_cmach.a");
        }

        // link with Lua
        if let Some(dir) = lua_lib {
            println!("cargo:rustc-link-search=native={}", dir);
        }
        println!("cargo:rustc-link-lib={}", lua_libname);
    }
}

src/lua/cmach.c:

This file includes lua.h, lualib.h, lauxlib.h, and also contains some own extensions for Lua to make sandboxing easier. I found it easier to write these in C, as I basically only do C calls here.

src/lua/cmach.rs:

//! Automatically generated FFI for C part of implementation

#![allow(non_upper_case_globals)]
#![allow(non_camel_case_types)]
#![allow(non_snake_case)]
#![allow(dead_code)]
#![allow(improper_ctypes)]
#![allow(deref_nullptr)]
#![allow(missing_docs)]
include!(concat!(env!("OUT_DIR"), "/ffi_cmach.rs"));

src/lua/cmach_ext.rs:

This is to provide replacements for the missing function-like macros. Maybe I should just include these in cmach.rs, but somehow wanted to keep the automatically generated code and my manual code in separate modules. Maybe I'll merge them though. Or re-export them in the cmach module. Not sure yet.

//! Extensions for [`cmach`] module where certain macros are missing

#![allow(dead_code)]
#![allow(missing_docs)]

use super::cmach;
use std::os::raw::*;

pub unsafe fn lua_pop(l: *mut cmach::lua_State, n: c_int) {
    cmach::lua_settop(l, -n - 1);
}

pub unsafe fn lua_tointeger(l: *mut cmach::lua_State, index: c_int) -> cmach::lua_Integer {
    cmach::lua_tointegerx(l, index, 0 as *mut _)
}

pub unsafe fn lua_tonumber(l: *mut cmach::lua_State, index: c_int) -> cmach::lua_Number {
    cmach::lua_tonumberx(l, index, 0 as *mut _)
}

pub unsafe fn lua_pcall(
    l: *mut cmach::lua_State,
    nargs: c_int,
    nresults: c_int,
    msgh: c_int,
) -> c_int {
    cmach::lua_pcallk(l, nargs, nresults, msgh, 0, None)
}

pub unsafe fn lua_pushcfunction(l: *mut cmach::lua_State, func: cmach::lua_CFunction) {
    cmach::lua_pushcclosure(l, func, 0);
}

When I compile the crate with the Lua feature enabled, I have to set the environment variables LUA_INCLUDE to /usr/local/include/lua54 and LUA_LIBNAME to lua-5.4 on my system (FreeBSD). This will make the build script use the correct include path and linking options.

Does that answer your question?

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.