Where to learn C headers for Rust sys wrapper making?


#1

Hey all. I’m ready to start writing sys wrappers for C code but I’m not a C programmer. I wrote a tic-tac-toe program once in CPP another lifetime ago. There’s a good amount of howtos on the Rust side of things for wrapping C code, method calls, callbacks, and creating safe wrappers to handle memory freeing. What I haven’t found is how a Rust developer who’s not a C programmer can understand what he/she needs to know from reading a C header file.

Where’s the “What you need to know when reading C header files for Rust”?

You have #ifdef, #ifndef, #define, and then you have things like in libzip where things are conditional on ZIP_EXTERN and all the method definitions at the end of the file are predicated with it. See srouce. I can understand that #ifdef likely means if-defined and #ifndef is if-not-defined. But what should I be on the look for and what can I take from looking at a header to understand what to write for it in Rust. Am I right in guessing that ZIP_EXTERN is a macro here? Does that affect my task?

I’m not looking into using anything like bindgen to create the sys wrapper automatically for me. No. What I want is to learn and be a better programmer for it.

Thanks for any help!


#2

Those are preprocessor directives. If you are manually writing bindings, these are your worst nightmare.

If you see them in a header file meant for public consumption, then they usually serve one of two purposes:

  • They paper over differences in nonstandard extensions offered by different compilers.

  • They paper over differences in the environment; operating system, pointer size, etc.

The first kind generally have little bearing on your bindings (you just need to learn how to identify them!), though they might occasionally complicate the linking stage.

Meanwhile, the second kind are why you absolutely need bindgen, so that the correct bindings for each individual user can be generated at compile time.


ZIP_EXTERN is the former kind. Typically, you can tell this just by looking at all the double underscores (identifiers with double underscores in C are reserved for use by the implementation, so they frequently show up in nonstandard extensions).

I’ll reproduce the part where they define it, adding some whitespace to help visually parse it:

#ifndef ZIP_EXTERN
#
#    ifdef _WIN32
#        define ZIP_EXTERN __declspec(dllimport)
#
#    elif defined(__GNUC__) && __GNUC__ >= 4
#        define ZIP_EXTERN __attribute__ ((visibility ("default")))
#
#    else
#        define ZIP_EXTERN
#
#    endif
#endif

C preprocessor directives are imperative, not declarative, so consider this code to execute like a program at compile time.

  • If it is already defined (e.g. if you ran gcc -DZIP_EXTERN=), they use the current definition.
    (probably for debugging?)
  • On MSVC, they want all of their functions to begin with __declspec(dllimport)
    (presumably to make it speedier)
  • On GCC, they use __attribute__ ((visibility ("default")))
    (presumably because… um, I really don’t know)

#define makes a macro. The compiler will textually substitute it wherever it occurs. Macros can be defined with or without function-like syntax (#define NAME stuff vs #define NAME(a, b) stuff) and there’s no special syntax for interpolation; it’s plaintext substitution on identifiers all the way down.

And don’t ask what happens if you use a macro inside the definition of another macro, or inside a macro argument. All I can really say is that the simple cases work… and the rest are not pretty.

(no, really, that’s all I can say! I still have no goddamn clue how libraries like boost::preprocessor work around the horrifyingly insane semantics of macro expansion)


You can look at the preprocessed form of your source like this:

gcc -E lib/zip.h

This site is a very good resource, though it will contain some amount of C/C++ lingo. In addition to covering the syntax it also covers some common usage patterns such as the “include guard” used in zip.h (first two lines plus the final #endif)