I want to make -sys crates for a few C libraries, however I’m stuck due to the number of different ways of handling various aspects of it.
First, it’s pretty clear that I want to be using bindgen to automate most if not all of the code creation. Should I call it from build.rs of my crate, or should I pre-generate bindings myself and publish the complete set of .rs files to save users from a libclang dependency that isn’t yet of the required version on some Linux distributions (see this issue) or that could be an additional pain to install on Windows?
How should I handle versioning? Should I depend on a specific version of the library and fail if the version is wrong, should I always support and generate bindings to whatever version is present on the user’s system exposing the differences through Cargo features (like ffmpeg-sys), or maybe create different versions of the crate for different library versions and publish them as separate crate versions (like llvm-sys)? I know that some distributions have older versions of the library than I do so support for that is desired.
How should I find the library sources and the object file to link against? pkg-config is an obvious choice, but that doesn’t work on Windows, and a way of manually specifying the library folder is often desired (an example is older versions of the openssl-sys crate needing a manual override because the system library is too new or vice versa). Should I fall back to downloading and building the library from source if all other methods fail? Perhaps only when the user specifically asks for it (if they just don’t have the library installed yet they might want to get a heads up and use their package manager instead of the crate deciding to compile the library right away)? How do I handle library compile dependencies and different possible library feature flags in this case?
Should the crate link to the library statically or dynamically? Static linking often means having to build the library from source to get the required object files. What if some user of my crate wants to dynamically load the library at runtime and obtain the function pointers this way, like is frequently done with OpenGL? Is there a way to support that use case without having to write two separate crates with essentially the same code?
I wish there was a single good way when it comes to building -sys crates but so far it seems to me like everyone’s doing it in their own ways with various advantages or disadvantages to each.
If the library has a stable ABI (new versions only add new functions, everything is backwards compatible), then you can pre-generate. It will be faster to build, and easier for you to tweak the interface.
If the library is changing a lot and users may have various incompatible versions installed, then you need to call bindgen from build.rs.
For users it's easiest if you have one version of your library and do your best effort to support whatever is installed (and even build from source if it isn't).
pkg-config is OK on Linux. It's OK-ish on macOS to get something to compile, but if mixed with Homebrew it produces brittle non-redistributable executables. So always have option to build and link statically.
On Windows the whole packaging situation is such a mess… I've given up on everything and just bundle C sources with my crate and build everything using the cc crate.
BTW: it's so much easier to just chuck C sources in a vendor folder in your Rust crate rather than try to download at build time. It's going to be faster and leaner than obtaining non-broken TLS libraries, compiling curl, tar and handling failures of all of them. Let Cargo just have the files ready for you.
This is a place where good guidance on the way to do things is seriously lacking right now unfortunately. Some thoughts:
I would not recommend running bindgen at build time due to the libclang dependency. It sounds like bindgen might be able to bundle that in the future which would make this approach more reasonable.
Running bindgen locally and checking the generated code in is a reasonable option, but you should be aware of some things to watch out for. In particular, bindgen will generate bindings that are correct for the target you built them for, but they may not be correct for other targets. This can be a particular problem for C libraries that use platform-specific type definitions or conditional compilation. Even if you're going to go the bindgen route, you should read over the headers of the library to make sure that they're "well behaved".
If you decide to go the route of manually writing the -sys crate, I highly recommend using the ctest crate to make sure that those bindings are actually correct. Every time I've set up ctest on a repo it's found some issue.
Version support will depend a lot on the use case. If the library you're wrapping is "well behaved" and the API surface simply expands over time, this is pretty straightforward. Otherwise, you'll have to detect the version you're building against and conditionally compile things as appropriate. rust-openssl's setup is the most complex version of this that I'm aware of. libgit2-sys takes a different approach of supporting basically one version of the C library, and building it from source rather than tracking down whatever version's installed on the system. This can make sense when there's a huge variety of versions installed on various systems that aren't ABI compatible.
The story around linkage is probably the most underspecified here. You'll want to use pkg-config on Unices, maybe vcpkg on Windows, and then allow for overrides. Lots of libraries offer the ability to build a copy of the C library from source through either a Cargo feature or an environment variable as well.
If you're linking against a copy of the C library that your build script built form source, you'll want to link statically, but otherwise it can be up to the user. pkg-config-rs allows this to be configured via an environment variable, for example.
If you end up doing this, and the C source is also in git, don't forget to learn about git submodules, which is gits way of nesting repositories. This means that you (and your users) can see exactly where your C comes from, and exactly which commit you wrap. This makes future upgrades easier on yourself, because you can see exactly what changed since your vendoring.
Thanks for the responses, I haven't got around to trying it yet (busy with other things). I also found out that clang-sys supports all three linking types (static, dynamic, runtime), it does so by wrapping all function declarations in a macro that gets expanded to usual extern fn in case of static and dynamic linking, and to function loading code in case of runtime linking.
Is a vendor folder some special functionality? I don't think I've seen it anywhere.
It's a convention for the folder for "stuff we copied from other people, and don't plan on ever changing ourselves, but may be upgraded". It's similar in role to a "lib/" or "dependencies/", with the connotation that it's "not our problem/external source".
Basically, it's one big workaround from the times before proper dependency management and semantic versioning existed.
Sometimes it is still used if the upstream only provides tarballs, or uses a different versioning system than your own project. (Which happens a lot in the C world)
I'm trying to wrap a "well behaved" library (that most people already have installed) in a -sys library. Therefore, the actual symbols provided by the library could vary greatly across consumers. How should I control this? Provide all the symbols I can, and if they use one that doesn't actually exist, let it fail at link time? Provide cargo features to add in the symbols, which could allow consumers to rely on features that might not exist on their machine (and then will fail at link time)? Or should I conditionally compile the extern functions based on what version of the library is present?
Additionally, how can I notify consumers which version is actually installed (and which symbols are available) so they don't have to duplicate my build script logic? Should I report the library version in a DEP_FOO_KEY-style variable?
For my crate I ended up providing features for different API versions (of which there weren't too many). So the potential users of the crate would target no features (minimal supported API level) by default and if newer API functions were needed they could enable the respective features (and subsequently lose compatibility with earlier library versions). Due to the specifics of the library, some of those API version mismatches are "caught" at link time (newer functions not existing in older versions of the library), while others are "caught" at runtime via a panic saying the library version is too old.
As for the other points I mentioned in the OP, I went with manual bindings generation (as opposed to running bindgen in build.rs), this allowed me to customize bindings slightly to improve support for all three OSes (and not depend on libclang at build time).
For linking I went with simply outputting cargo:rustc-link-lib= from build.rs, with ability to set the library path with an environment variable (no pkg-config) and with default path on Windows set to where the library installer installs the libraries by default. The way it's currently set up only dynamic linking is supported. There's no support for building the library from build.rs either as its build process is quite complicated (and I don't think there's a need for it). I did plan on implementing runtime linking support in the future though.
That's the situation openssl-sys is in as well. There, we have the build script look at the OpenSSL headers to figure out what version is installed and what features were enabled/disabled when it was built:
That information is then reported for downstream consumers in the DEP_FOO_KEY-style. The openssl crate exposes whatever functionality exists in the OpenSSL it was linked against. Downstream code can use the DEP_FOO_KEY setup to conditionally compile if needed. Here's an example of a downstream crate doing that