[clib 0.1.0] Generate Rust bindings for C libraries, using .toml rather than .rs

This is the repository

Purpose

Use toml files to do no-code generating bindings to C libraries.

  1. It can be used as a replacement of -sys crates, each of which should have generated bindings separately for one single C library.

  2. It can help a standalone -sys crate to generate bindings. The users do not need to learn the usage of pkg-config and bindgen crates.

Requirements

  1. C libraries must provide pkg-config file.

  2. C libraries can be compiled with bindgen's default cofiguration.

Bundled C libraries

The toml files of the following libraries are bundled in the clib_spec/ directory: libcurl, liblzma, sqlite3, tcl86, tk86, x11, zlib. This list can grow in the future.

These are also crate features.

Example of generating "sqlite3" bindings

Add this crate to Cargo.toml, and enable the "sqlite3" feature.

clib = { version = "0.1", features = "sqlite3" }

All generated functions, types and constants are in the root namespace of this crate. You can prefix them with clib::, e.g. clib::sqlite3_open(), or
use clib::*; and use sqlite3_open() directly.

Extra C libraries

Use the environment variable CLIB_EXTRA_LIBS to assign a whitespace-separated list of C library names which are not provided as crate features. Missing toml files are provided via searching CLIB_SPEC_DIRS. See section below.

Note: the library name must be accordant to the ".pc" file name.

Extra directories of toml files

Use the environment variable CLIB_SPEC_DIRS to assign a semicolon-separated list of extra search paths for toml files.

This list can

  1. provide the locations of toml files which are not bundled by this crate.

  2. override bundled toml files. For example, to override sqlite3.toml, just put your modified file in downstream crate's clib_spec/ directory.

Note: Absolute paths are preferred over relative ones, because the latters are NOT relative to downstream crates but to THIS CRATE.

Set minimum version requirement

Use the environment variable CLIB_{}_MIN_VER to set a min version requirement for the library {}. The absence of the variable means "any version is ok".

The toml configuration file's syntax

The "header" section

Currently this is the only supported section in toml configuration file.

  1. files: a list of C headers to generate bindings for.

  2. import: a list of dependencies of other C libraries' names. It is optional.

  3. import_dir: a list of C libraries' names, hearders of which are included in current library's header(s). It is optional.

Take tk86.toml for example:

[header]
files = [ "tk.h" ]
import = [ "tcl86" ]
import_dir = [ "x11" ]

The value of files indicates that tk.h is the public header for tk86.

The value of import indicates that tcl86 is an upstream library which need also generate bindings for.

The value of x11 indicates that x11's include directory must be added into the current library(tk86)'s header search path. This is required because tk86's headers include x11's headers.

License

Under Apache License 2.0 or MIT License, at your will.

2 Likes

cc @josh

@oooutlk this looks exciting! Are you familiar with a "metabuild" Cargo RFC from a while ago?

The TL;DR is that, at the moment, external tooling (like dist-specific packages) don't know the list of C libraries a Rust crate depends on, because this list is specified imperatively in the build.rs. The rough idea to fix this is to replace a fully custom build.rs with a more constrained metabuild, which can build only a specific list of C dependencies, which are specified in Cargo.toml. The metabuild implementation stalled as far as I know, but your clib crate looks like it might fill the niche?

Specifically, I think if you move the info from env-vars like CLIB_EXTRA_LIBS or CLIB_{}_MIN_VER to a [meta.clib.libraries] section of Cargo.toml, than external tools should be able to get a precise list of C deps of crate.

2 Likes

I started writing high-level tcl and tk API bindings in Rust several months ago, and decided to use bindgen to help generating low-level APIs. The tk library is based on tcl library and its APIs use some types defined in tcl. I found the common types both in tcl-sys and tk-sys crates are defined in different namespaces(tcl-sys:: vs tk-sys::). This led to conflict and compile errors.

An idea came to me, "why not use one crate to provide utilities of both tcl-sys and tk-sys, with tcl/tk crate features? Generally, more C libraries can be supported this way, as long as they provides configurations."

  1. Namespace issue. All functions types, constants are in the same namespace clib::. Since C does not have namespaces and uses library names as prefixes to avoid naming conflicts, this seems not an issue in practice.

  2. Configuration format. The pkg-config format is popular and chosen to help collecting include paths and link arguments. The toml format is favored in Rust and can describe the dependencies between C libraries. Textual configurations are more constrained than arbitrary build.rs.

  3. Centralization. All low-level bindings are generated in one single crate(clib). Feature-per-library provides a clear list of packages. However if your target library is not included in clib crate then you must provide its name and its toml configuration file. That's why environment variables are needed.

I was not familiar with the metabuild RFC, but read through it several times after your mentioning. Pleased that this crate may be benefit.

The following is my understanding about what can do to make progress on metabuild( and correct me if I was wrong ):

We still need various -sys crates, one crate per one C library. But these -sys crates must not have build.rs. Instead their Cargo.toml files contain metabuild infomation which is the equvalent of clib's toml configuration files under clib_spec/.

Take tk86-sys for example.

[packages]
name = "tk86-sys"

# Note: metabuild is expected to be a string, not a list of strings.
metabuild = "tk86"

[meta.clib.header]
files = ["tk.h"]
import = ["tcl86"]
import_dir = ["x11"]

This seems roughly correct, if we follow the metabuild approach exactly. But I think it's more valuable to think in terms of goals, rather than specific implementation. The properties of metabuild we want are:

  • the list of C libraries required to build a crate is known statically, from reading a Cargo.toml
  • there's common API for external build systems to say "don't build C code yourself, link to this .so/.a instead"

I can see several ways this can be implemented:

  • metabuild approach, where we add new Cargo functionality to declaratively specify a build.rs which would then look at Cargo.toml
  • metabuild-lite approach, where there is convention that build.rs is just fn main() { metabuild::main() } (btw, this is something which can be implemented today, without any support from Cargo).
  • clib approach, where's there's just a single crate for global C namespace, with some declarative configuration of what gets included.

I think if clib's config is moved from env vars into Cargo.toml's meta section (not sure if it is feasible!), it'll be able to solve original metabuild's problem, using a different mechanism.

Oh, I think the Cargo.toml you specify should read

metabuild = "clib"

The trick is that all crates use the same metabuild, which naturally enforces common interface.

1 Like

After some testing I found that metabuild might be infeasible for clib providing global C namespace.

  1. Downstream crates provide [package.metadata.clib] in their Cargo.toml files and emit metabuild="clib" to call clib::metabuild().

  2. Invocations of clib::metabuild() collect metadata and generate bindings of C libraries in a single file bindings.rs.

  3. Finally the bindings.rs is done, but we need time machine to send it back before building clib( in order to provide clib::metabuild() ).

However, I believe the followings are still feasible for clib 0.2.0:

  1. no "CLIB_*" environment variables any more.

  2. [package.metadata.clib] in downstream Cargo.toml files, rather than some-pkg.toml files in clib_spec directory.

1 Like