RFC: version pinning in llvm-sys

With the release of LLVM 3.8, there have been some backwards-incompatible changes to the C API that llvm-sys binds to. Since the C API can change in incompatible ways across LLVM versions, I would like to make a way for llvm-sys to enforce that a compatible C API is used.

My initial feeling is that the crate's own version number should indicate the target version of LLVM, such that llvm-sys ~3.7 enforces that the version of LLVM in use is part of the 3.7 series. This allows easy tracking of version requirements, but could complicate setup for users of the library.

To avoid forcing users to be careful about what version of LLVM they have avaliable when building applications that use llvm-sys, I have two ideas right now:

  • Add support for downloading and compiling a copy of LLVM matching the version of llvm-sys in use. This is not a small download, nor is it quick to compile. It does however allow a user to skip any LLVM setup on their system, which is especially useful for platforms where there are no LLVM library binaries available (like Windows). Failures could be difficult to debug for users.
  • Have a database available at build-time tracking which functions are available for a given version of LLVM, so llvm-sys can cause a compile-time error if a user attempts to use a function which is known to be unsupported with the linked version of LLVM. This would likely require build-time code generation for the bindings, and could be confusing since functions could appear in documentation which are not available, with no capability to provide a very meaningful error message.

So, I'm interested to hear what others think about this problem. Do you think version pinning is important? Which approach (or one of your own) to easing the pain do you think is nicer, or is it not a concern?

I'd be interested in this as well for openssl-sys, since OpenSSL 1.1 is introducing a set of breaking API changes that won't necessarily be easy to work around in a compatible way.

My first instinct would be to err on the side of maintaining a consistent API for consumers of your crate. IE, llvm-sys 38.0.0 always exposes the API of LLVM 3.8, and because we've promoted the LLVM "minor" digit to a semver-major position, consumers that aren't expecting 3.8 won't get it by surprise.

This leaves the question of implementation. I'm a fan of "support for downloading and compiling"; I'd rather not have the API surface of a fixed version of llvm-sys change depending on the system where it is compiled, especially not by default.

Good point.

That's what I've been leaning toward as well. There's less room for confusion and it's easier to implement.

I don't have a horse in this particular race, but I'd just like to say that as a Windows user, I loathe crates that compile anything more complicated than a single C-file that depends on nothing but C89's standard headers.

If LLVM 3.7 isn't compatible with 3.8, I don't see why they shouldn't just get different major version numbers. Yes, it means you have to pick a particular version of LLVM (which may or may not be supported in a given package repository), but isn't that going to be a problem no matter which way you swing it?

On a general note, I'm beginning to feel like it's high time everyone got together and sorted out a standard platform for native dependencies. The status quo seems rather messy and haphazard.

1 Like

What if it were to not build anything automatically by default? We can try to invoke llvm-config and check the version it reports as is currently done, then look at a feature flag/environment variable/something else. Only do the download+compile if that flag is set, where its default value would be "give up and report an error".[quote="DanielKeep, post:5, topic:4915"]
Yes, it means you have to pick a particular version of LLVM (which may or may not be supported in a given package repository), but isn't that going to be a problem no matter which way you swing it?
[/quote]
Yes. I'm concerned about end-users (trying to compile a package that pulls in llvm-sys as a dependency), since building a copy of LLVM and configuring the environment so llvm-sys can find it is not exactly an easy task for those unfamiliar with it. If we can offer a way (not necessarily used by default) to simplify that process, I think it's worthwhile.

I'm in agreement with DanielKeep regarding compiling C (honestly, I think I'm a bit more extreme - I'm not especially impressed with how even some crates that just compile a single C file handle cross-compilation).

In addition, using Cargo features won't work as a mechanism for the end user to control it, because cargo features are both additive and automatic - if any dependent requests the feature, Cargo turns it on, and the user is out of luck as far as preventing the download goes.

Finally, with my distro packager hat on, buildsystems that contact the network during build inspire unreasonable amounts of anger. Packages that do that pretty much never handle being denied network properly, and thus require much more effort to build in any kind of sandboxed environment - be that distro packaging, CI, or otherwise.

So if you do build LLVM yourself, please at least just pin a commit of the git mirror as a submodule, and bundle the files in the crates.io package.

More and more, I feel like crates.io needs a way to have optional, out-of-band binary blobs. Downloading the entire LLVM source every time you update the bindings? No... plz no.

I'll take larger downloads over arbitrary, optionally-fetched content any day. One's an efficiency loss; I've personally dealt with the other causing problems of compatibility and such. ("Oh, I'll just ship a prebuilt version of the codegen thing I need. Nobody will EVER run this on an architecture I didn't anticipate!")

TBH, I personally have a strong dislike of buildscripts as well - people do not understand the complexities of correct build handling well enough to do it right, which results in blind calls to "gcc" and packages that are fundamentally incapable of cross-compiling or using an alternate C compiler. I have seen this firsthand in building Rust stuff, partially because my distro (Exherbo) handles all compilation as cross compilation, and so has no unprefixed "gcc" binary.

Fundamentally, I am confident the average crate developer that would use this does not have the knowledge to use it safely.

Don't forget that, for some people, downloading costs money in addition to time.

And, as a Windows user, I hate build scripts too (no, you can't just call a shell script like that, I don't have pkg-config, and even if I did, this library doesn't have a .pc file in the Windows download AARRGGHHGHglbllblb foams).

Wouldn't this all be best done using feature flags for the underlying version of LLVM? This way you could control the functions and enums being wrapped by hiding them in cfg sections.

This way you wouldn't need to maintain multiple branches in parallel.

clang-sys uses Cargo features to select desired version. This seems to work fine.

BTW, clang-sys does not use llvm-sys. It seems like it should. I needed both, and had to fiddle with env vars.

It could be done that way, but I decided it would be easier to have a branch for each release rather than enumerate versions for each function (or definition) that is added or removed.

I submitted CI fixes for llvm-sys for 3.{7,8,9} and 4.0. This meant 4 PRs. That was a bit tedious, imo and there's still discrepencies between the files since I had to make silly fixes for 3.7 that I didn't bother bringing back to 4.0.

I guess llvm-sys is more or less stable now so it would be a dubious amount of value for a non trivial amount of work but if there are any refactoring projects on the horizon, maybe it would be a good idea to flatten the project into a single branch at that point.