Pre-RFC: generating assets/products during Cargo build

Rust programs may want to generate assets/products which aren't for the build itself, but are products to be used along with the program, such as:

  • bash_completion files
  • manpages and other auto-generated documentation
  • .service files for setting up servers under systemd
  • icon/resource files compiled from raw images
  • etc.

Currently Cargo runs build.rs with an OUT_DIR variable, but the location of OUT_DIR is private to the build script, and it's unpredictable. It's fine as a temporary directory to be used during the build, but it doesn't seem appropriate for files to be used after the build is done.

These files usually need to be present in a predictable location, so that post-build tools can find them and package them along with other Cargo products.

What would be the right solution for generating and making these files available?

Current options are:

  • Making build.rs place files in OUT_DIR and then searching for this directory. This is obviously hacky, slow, and may break when Cargo stats placing temporary files in dedicated system directories rather than dumping everything in ./target.

  • Making build.rs place files in ./target/…/somewhere. This is trickier than it seems, because paths in target are different during cross-compilation, and the build.rs script isn't told where exactly is the target dir, so scripts have to replicate a Cargo's implementation detail. Also, this means that all crates write to the same shared directory, so they could overwrite each others' files.

  • Generating the files using a different build system external to Cargo, using custom locations. While this works, it misses out on integration with the Cargo ecosystem. For example, cargo-deb won't know how to build such crates, and would have to require extra configuration or invent its own non-standard solution for this.


Based on discussion in the Cargo GitHub issue, I suggest the following:

  1. Add a new directive supported in build scripts: cargo:asset=<path> which tells Cargo that the given path is a build product from the script, and this file should be kept.

  2. Cargo will move/copy/hardlink the asset to a well-defined location, such as target/assets/<crate-name>/

It's in two steps in order to let Cargo track build script outputs and be able to manage the asset directories.

It could be extended, e.g. cargo:asset=kind=<path> with different kinds for guiding cargo install and letting it place the assets in appropriate system directories.

cargo:asset=<absolute source path>:<relative dest path> to generate assets with a directory structure instead of "flat" list of files (or the source path could just be a directory in the first place).

What do you think?

7 Likes

I would prefer not to conflate "stuff you need to do to produce a rust binary artifact" and "arbitrary additional stuff you want to do to build your application". Using build.rs to, say, produce man pages rubs me the wrong way.

I think the right solution here is something like cargo workflows, which are stalled. However, I think everything except UI integration with cargo is there, so perhaprs we can try to play with some "unofficial standard"?

For example:

if there's a ./tasks directory, the packaging tool would run cargo run --manifest-path/Cargo.toml postbuild after usual cargo build, and will fetch artifacts from ./target/assets

9 Likes

I'm happy to see this attempt to standardize the building of assets! One approach that wasn't mentioned yet is setting a different environment variable (CARGO_ASSET_OUT?) I'm not sure which is preferable, I'll be happy to hear good arguments for/against. It seems to me that adding a new directive would make making tools like cargo-deb more difficult, but I never did anything like it, so maybe I'm missing something obvious.

I like the idea of marking asset kinds or using relative dest path. This would make creating smart tools easier. OTOH it requires standardization of naming. Using directory structure bypasses this, but allows the tool to do the right thing - the crate doesn't have to. This could be also resolved by making a library for handling this stuff, but how would it work with different kinds of systems? Would it have to be rebuilt if two systems use different directories? I guess yes. One way to resolve it would be standardizing several common kinds and allowing the crates to use x- prefix if they want something special (similarly like in case of HTTP or MIME).

Another thing to consider is that the crate could produce architecture-dependent and architecture-independent files. Architecture-dependent files should go to architecture-specific directory. Architecture-independent files should go to a shared directory. Since architecture-independent files only need to be generated once, it could save time when generating multiple packages for different architectures. Tools like cargo-deb could even leverage it to create *-data packages - that'd be neat! Additionally there could be some kind of detection that issues a warning if a crate generated files for a different architecture that are considered shared, but aren't identical.

@matklad Yeah, it feels kinda strange to me too, but I think there's an interesting reason to allow this: if the build script must calculate some data from inputs, then in case of build script, it might do it once. If it was separated, it would have to recalculate it again or use some kind of caching. So I think possible performance gains are worth slightly uneasy feeling. :slight_smile:

2 Likes

Another thing I forgot to mention is this might be also useful if allowed in derive macros or other proc macros.

Very good point about overuse of build.rs.

What are the chances of "official" Cargo tasks going anywhere? @aturon

Currently the most popular task runner seems to be cargo-make, but it's quite extensive and general purpose, so even with it it's unclear what would be the standard way to, e.g., generate a manpage.

Even if we assume there is a task runner, the other point still stands — where and how should the task place its output?

Should dependencies be able to run tasks and generate files?

I think that the best person to direct the question to nowdays would be @ehuss

Currently the most popular task runner seems to be cargo-make , but it's quite extensive and general purpose, so even with it it's unclear what would be the standard way to, e.g., generate a manpage.

The critical difference between cargo make and what I am proposing is that cargo make requires that the builder has cargo-make binary in the path, while cargo run --manifest-path ./tasks/Cargo.toml is self-bootstraping: it doesn't need anything beyond the cargo itself, today. Note also that we don't have to specify how manpage is build, we only need to specify where to put the result. Like, we don't specify how C code should be build (although cc is de-facto standard), we only specify that it should be put into OUTPUT_DIR, and that certain strings on stdout have a certain meaning.

Even if we assume there is a task runner, the other point still stands — where and how should the task place its output?

If we restrict this facility to the top-level crate (and I think we should do it, it's a pretty core property of Cargo that the build-process of dependencies is not really customizable), than it seems reasonable to put it to $CARGO_TARGET_DIR/assets (where assets is bickeshedable, and can be resources, or extra). That is, the assets should probably not depend on the current profile, and should be namespaced inside the target dir.

1 Like

I've stumbled upon your "make": https://matklad.github.io/2018/01/03/make-your-own-make.html That pattern makes sense to me. The .cargo/config/tools/Cargo.toml could be considered a polyfill until Cargo implements support for this pattern natively.

As for the output dir, I wonder if limiting it to top-level will be problematic for some use-cases. What if a library requires a lookup table that's too large to put via include_bytes? It could generate lookup.bin as an asset, and instruct the main executable to make it available at runtime.

That seems like a pretty narrow use-case to me. However, the library can export the function to generate the string and require that the bin manually call this function from postbuild.

I just realized another possible issue: I think we should allow proc macros to generate such files too. It might happen that configure_me will be moved to use proc macro instead (this isn't certain yet) and it'd suck if it would make man page generation impossible.

I think filesystem access from proc macros is a controversial issue. There were suggestions to completely sandbox them.

1 Like

I'd like to be able to generate *.rs files from specification files. For example, I've implemented an LALR(1) parser generator in Rust (previous versions were in Modula2 and D) that takes a yacc/bison like specification of a grammar and generates a Rust file that implements a parser trait for a user specified type that can thereafter be used to parse input that purports to be an instance of the grammar. I'd like a way to tell Cargo to generate (including telling it how) that file if it's missing or older than the specification file. In the other versions of this program (Modula2 and D) I've used "make" to achieve this and could probably use it in conjunction with cargo to do the job (I currently just do it manually) but it would be preferable if it could all be left to cargo.

May I suggest taking a look at this: https://people.gnome.org/~federico/blog/rust-build-scripts.html - the "Code generation" section will be relevant. (Please think of it not only as generating code, but generating any kind of file that may be installed post-build.)

I'd like to see a scheme where build systems that embed Cargo can pass the location of generated files. Something where the build system can call

cargo build --path-for-foo=/foo --path-for-bar=/bar` other-options-here

And then the build.rs can do something like

foo_path = cargo::path_for("foo");
write_a_generated_file(foo_path);

bar_path = cargo::path_for("bar");
write_another_generated_file(bar_path);

Then the build system can take /foo and /bar and place them in their final location.

For Cargo-only situations, where Cargo is not being run by such a build system, maybe it would generate the paths for a reasonable place? concat($DESTDIR, $PREFIX, name), perhaps?

I think this means that build script actually start being able to consume a Cargo-provided API to figure out file paths and other stuff. The current scheme of guessing everything from environment variables is pretty ad-hoc, as I detail in my post.

1 Like

I'd actually agree with sandboxing if there was some standard way to safely perform additional output generation. E.g. by using another context argument to proc macro.

I think this is off-topic (but an interesting topic, I'd love to see discussion elswhere!)