Outdated *-sys crates

Continuing the discussion from Current solutions for key value stores:

This made me wonder whether it's good (and/or common) practice to copy source code of other libraries (e.g. C libraries) into a Rust crate, or whether it's better to automatically search for installed "native" libraries when building the crate.

I found this post by @kornel, which supposes both variants are used. This part explains how a crate searches an already installed library and links with the already installed library during building:

To make a sys crate:

[…]

  1. In Cargo.toml add links = <library name>. This informs Cargo that this crate links with the given C library, and Cargo will ensure that only one copy of the library is linked. Use names without any prefix/suffix (e.g. florp, not libflorp.so). Note that links is only informational and it does not actually link to anything.
  2. Create build.rs file in the root of the project (or specify build = "<path to build.rs>" in Cargo.toml).

For the build.rs script the general approach is:

  1. Find the library.
  2. Select static or dynamic linking.
  3. Optionally build the library from source.
  4. Expose C headers.

Later the post also mentions a different practice where the source code is copied into the crate and the (native) library is then built from source:

If the library is unlikely to be installed on the system by default, especially when you support Windows, it's nice to automatically build it from source (and link statically).

It's a massive usability improvement for users, because they can always run cargo build and get something working, instead of getting errors, searching for packages, installing dependencies, setting up search paths, etc.

Downloading of the source is tricky, so it's best to avoid it. The build script would need to depend on libraries for HTTP + TLS + unarchving, which itself may be bigger and more problematic to build than just bundling sources with the sys crate. Some users require builds to work off-line. So unless the library's sources are really huge, just copy them to the Rust crate (and make sure to use a compatible license for your sys crate).

Which of these variants do you prefer?

I've been quite frustrated to see the lmdb-sys and liblmdb-sys crates including a copy of the original source code that is now several years old and doesn't get updated. (It does get updated in the upstream, but these updates aren't merged into the crate!)

This actually reminds me of the bad practice to use outdated Docker images with software that is years old :frowning_face:. Well, of course I can choose not to choose these crates.


P.S.: This isn't supposed to be criticizing the authors/maintainers. Nobody is really required to update their published software forever. I'm always happy for anything published as Open Source, also when it is published "as is" or doesn't get updated.

2 Likes

I strongly prefer compiling native dependencies from source and statically linking over relying on it being installed on the system. Having everything statically compiled makes distribution super easy, and tracking down UB/crashes because of ABI changes (e.g. rearranging fields in a public type or changing function arguments) is a nightmare I never want to subject myself to again.

I know people argue that you should dynamically link to a system library so package managers can roll out security patches transparently, but I haven't really seen that help much.

For example, I doubt you can update lmdb-sys with its several-years-out-of-date lmdb without needing to rewrite a bunch of code due to API changes. The only times it will actually help is for a library that is very stable and has a strong commitment towards backwards compatibility.

1 Like

That'd be my concern too: Assuming there is a security vulnerability in libpng, for example. Any Rust programs which (through their dependency tree) use a *-sys crate with an old version of libpng won't be fixed when I run apt-get update && apt-get upgrade on a Debian system, for example. This doesn't sound good from a security p.o.v., particularly as we're usually talking about C libraries which do not benefit from Rust's safety guarantees.

I can still statically link with an installed library and create bindings from installed header files during building, right? Doesn't that fix the issues you're describing? (At least in most cases… It won't help if two function arguments of the same type are switched, of course.)

Or maybe that's just a Linux/Unix way of thinking (as I have /usr/include/*.h and /usr/lib/lib*.a files) and this won't work with Windows at all? (I have no experiences regarding FFI with Windows.)

At least there should be some review, so you're probably right that it wouldn't fix my problem automatically (in all cases / reliably).

Linux distros almost always ship only dynamic libraries and not static libraries. Some libraries are also provided as static libraries, but those are generally libraries like libc. You can't statically link dynamic libraries.

Are you sure? Just checked Ubuntu and Debian, and there is a directory /usr/lib/x86_64-linux-gnu with a lot of lib*.a files. On my FreeBSD, these are in /usr/lib and /usr/local/lib.

Anyway, re-thinking about it:

  • Binding statically bears the same problem that security updates aren't considered. (Edit: I mean security issues aren't instantly fixed after updating a system through the package manager.)
  • Yet, binding statically is still better from a security p.o.v. than copying old code and never fixing it, because at least (re-)building an old crate on a patched system would include a security fix if it uses a patched (static) library from the system.
  • Aren't there version number suffixes for *.so files? Shouldn't this prohibit UB/crashes when using dynamic libraries?

Maybe the problem (or rather solution) here is making good build.rs scripts. I would like to hypothesize that copying vendor C code into crates is tempting because

  • it makes providing a good build.rs script easier,
  • it makes installation on many platforms easier,
  • it just works (or it seems to be, while under the hood you could have a huge security nightmare).

Those are quite harsh words, and I'm not sure if I'm fair here (hence why I said "hypothesize"), because in some cases it can really make things easier when a particular version of the source is shipped with a crate. However, maybe this is actually a symptom of issues/difficulties/complexities with the build system and integration with the OS? I feel like there is a high price paid to work around these problems.

That assumes your system library installs *.a files. Often you only get shared libraries, in which case any computer you move the resulting executable to will also need to have those shared libraries installed.

This also means that your package manager is the one deciding which version of a dependency you end up using, which isn't ideal when you are building in CI or outside an open-source/hobbyist environment. If your business depends on an app, and that app uses a library, you would prefer to use a version that is known to work as expected rather than the latest version.

The Windows solution is to either statically link or make sure your installer drops a copy of any necessary DLLs in your program directory.

Windows has no packages manager, so the idea of "system libraries" doesn't exist outside the Windows API.

For what it's worth, you almost never do a literal copy. Most of the time you will add the native library as a git submodule so updating to the latest and greatest is just a case of git submodule update --remote then fixing any compilation issues.

I think we are actually talking about 3 separate concepts here:

  1. Dynamically linking to a library installed by your package manager
  2. Statically linking to a library installed by your package manager
  3. Statically linking to a library you build from source as part of your crate's build.rs script

I'm in favour of #3 because, as someone who writes Rust code for work, I want full control over my dependencies and want distributing my app to Just Work.

Sometimes an abi breaking change is missed. These can be very subtle. With PIE even changing the size of a global is an abi breaking change. In addition ld always links agains the version symlinked without version number (eg libfoo.so). This is often the latest version even if the bindings in the foo-sys crate weren't made for this version.

Using old versions usually does require backporting security fixes, I'd say. There's not always resources for that. I know Debian is pretty conservative on updating packages, but they also invest a lot to backport security fixes.

So you'd copy the DLL of an already installed library in your own program directory? Isn't that contrary to the concept of a DLL?

Yeah, I have seen that. Given the crates are properly maintained, this would work well. But do crate maintainers really respond in time to critical bug reports in the upstream? I would trust the Debian security team (without knowing it personally) more than a random crate maintainer… but maybe I'm too pessimistic here.

I guess variant 3 is okay when/if the crates are regularly updated and well maintained.

Hmmm, I see.

That's why I would say that when you use an already installed library, you should dynamically generate the bindings when building your crate (and not before shipping it). I would only use statically created bindings in a crate if I also include (and build from) the source it was generated from/for.

I'm curious if it's possible to write a bot that scans crates.io for crates which incorporate unpatched vendor source code with security vulnerabilities. Has someone ever attempted to build something like that?


P.S.: There is https://rustsec.org/ Not sure if "copied source" is also considered in the analysis.

Haha, yes. Yes, it is.

This pulls in LLVM as a dependency, though. Often turning a 10 second build into multiple minutes and opening your users up to issues where one *-sys dependency uses a bindgen which links to llvm-sys 57.0 while a different dependency links to llvm-sys 54.0.

We had that exact same issue when trying to use wasm3 (a WebAssembly interpreter) and another library in the same application and it sucks. Cargo downright refuses to compile your crate, so we were forced to install the bindgen CLI and make wasm3-sys shell out to it at compile time. We made PRs to bump version numbers upstream, but wasm3 merged the PR while the other library didn't.


Unfortunately, this whole area is a mish-mash of multiple build systems and package managers (or lack thereof) and conventions, with no one true answer.

For example, businesses don't want to be at the mercy of the Debian team. Meanwhile, sysadmins want to roll out security updates without recompiling anything, and users want a stable system that Just Works.

Often what works for one use case is terrible for another, and a solution that works well for one type of library (e.g. a C library) is impractical for another (e.g. imagine having a system package for everything on NPM).

1 Like