[Bindgen] Handling C code that produces un-ergonomic Rust?

I'm currently working on building an Nginx module in Rust, and the existing C code is resulting in some awkward Rust code when run through Bindgen. For example:

#define  NGX_OK          0
#define  NGX_ERROR      -1

produces

pub const NGX_OK: i32 = 0;
pub const NGX_ERROR: i32 = -1;

But it's used in functions with a return type of

ngx_int_t function_name(...)

Where ngx_int_t is defined as

typedef intptr_t            ngx_int_t;
typedef __darwin_intptr_t   intptr_t;
typedef long                __darwin_intptr_t;

Or, in Rust:

pub type ngx_int_t = isize;

Which results in type errors when trying to return NGX_OK from a function.

Another fun example is:

#define NGX_CONF_OK          NULL
#define NGX_CONF_ERROR       (void *) -1

Which, for reasons that are unclear to me, is returned as char * function_name(...), or function_name(...) -> *mut ::std::os::raw::c_char in Rust.

Perhaps unsurprisingly, neither of those #define statements result in Rust constants.

Another one that's annoying is #defines like this:

#define NGX_MODULE_SIGNATURE_1   "1"

Which, quite reasonably, translate into

pub const NGX_MODULE_SIGNATURE_1: &[u8; 2usize] = b"1\0";

In Rust, but these particular constants are meant to be used by other #define directives to build the actual String constant, so I wind up doing something like this to get the data I need from them:

NGX_MODULE_SIGNATURE_1[0] - 48

What do people do in similar situations? I can see a few options:

  1. Exclude the constants from automatic generation, and add them in manually with the right types. Easy to do, but doesn't scale if there's a lot of these situations (which I suspect is the case)
  2. Use the C2Rust refactoring tool (or sed or something similar) as part of the build process to automatically refactor this in the generated sources
  3. Leave it as-is and accept that it's going to be a pain to use

Well, the translation is correct. The type of symbolic constants in C have to be "guessed" rather than inferred properly. While defining ngx_int_t as isize is also correct since that is exactly what long corresponds to.
You can do something like return NGX_OK as isize; to resolve the type error.

Sure, but my point is that doing that everywhere isn't really a nice solution. Presumably I'm not the first person who's run into this and I was wondering what other people have done in this situation. If the typical advice is "leave the generated bindings alone and write a wrapper crate", then fair enough.

1 Like

I disagree. A human translator would almost certainly assign isize as the type for these constants, not i32. From my perspective, bindgen is clearly in the wrong here (though the cause of its mistake is quite understandable).

I was referring to mathematical correctness. A human translator has the advantage of prior knowledge of semantics, which would would allow them to choose a better option.

Language translation is fundamentally a semantic operation, though— It can only be "correct" if the meaning is preserved, which bindgen fails at in this instance. In particular, it fails to preserve the connection between constants like NGX_CONF_OK and the typedef ngx_int_t.

This statement is debatable and is fairly nuanced, but okay.

More importantly #define NGX_OK 0 confers no semantics as for the specific integer type. It can be any integer type, and as far as C is concerned, the type depends on where it is substituted.
I guess the correct bindgen reaction to such statements would be to say "screw you, I can't infer the type, so I'm going to quit", but that doesn't improve things, does it?

Rust has a construct that behaves this way as well: a macro that expands to an untagged integer literal.

Can you invoke a macro without the trailing brackets?

It seems not, but there's always some name mangling that happens in this sort of tool. Converting NGX_OK into NGX_OK!() isn't great, but it isn't terrible either.

And neither is NGX_OK as _ but beauty lies in the eyes of the beholder.

1 Like

This is how Rust works, but I don't think that's true of C; my understanding is that integer literals in C are only a different type than int if they are too big to fit in a regular C int (which 0 is not). So it's not that bindgen is missing context here, it's translating the code as understood by the C compiler; it's just that C also does implicit casting all over the place, whereas you need to write those casts explicitly in Rust.

3 Likes

If you want to have a different type for your defines you can use ParseCallbacks::int_macros to choose a different type of a define.

4 Likes

Oh that's really interesting! Thanks, I'll dig into that.

That's the typical approach.

The idea is that your foo-sys crate contains function declarations and all that for using the C library directly, and you have a foo crate that gives it a more ergonomic interface.

In theory, this approach sounds like you'll have more maintenance work, but in practice it's actually better because you only need to switch from the C style usage to Rust style usage once (in the foo crate) instead of every place that interacts the C code. It also gives you a nice place to paper over any "quirks" in the generated bindings (i.e. the integer type used for #define constants), introduce higher-level abstractions, and enforce requirements imposed by the C library so our unsafe code is sound.

4 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.