External code generation, still use cargo?

I know Rust has macros/proc-macros/generics but for a variety of reasons I've been thinking about "external" code generation, meaning taking some sort of input files and passing them to a binary/script that isn't rustc that then creates source .rs files on disk. There are some practical concerns:

  • I need to rerun the code generation step when the input files change.
  • The generated code might have dependencies on other crates.
  • If I generate enough code, I might start to care about compile time and actually want to generate multiple compilation units.
  • I may need to generate code that uses other, earlier generated code. In other words a generated crate may need to use another generated crate.
  • If I end up generating a crate that's identical to one I've compiled in the past, I'd like to reuse that compilation.
  • Other things I'm probably not thinking of....

Are there any good examples out there I could look at where people have setup this sort of thing already? I know I can have build.rs scripts that specify rebuilds should happen on input file changes, but in general I don't know if I should be bypassing cargo and using rustc directly if I need this kind of control, and it would be useful to hear from anybody that has already gone down this rabbit hole.

In the C++ world where there's no standard package manager to begin with I would probably write my own custom build script in Python and combine with ccache and icecc.

One trick I've shamelessly stolen from rust-analyzer is the following....

First, put all of your code generation logic and dependencies in a codegen crate. By being in its own crate, you can separate your build dependencies from the rest of the workspace's dependencies, meaning anyone using your generated code (e.g. from crates.io) won't need to compile codegen or its dependencies.

It also means you avoid the chicken-and-egg situation where building crate foo requires the generated code to be up to date, but because that code is generated as part of foo you can't compile and run the generation code.

Next, add a test to your codegen crate which will generate the code in memory, compare it to the version on disk, then either let the test pass or update the file with the new generated code and fail the test with some sort of "code for path/to/file.rs was outdated. Please commit the changes and re-run this test".

Here is a concrete example:

There are a few benefits from using this approach instead of build.rs or proc-macros:

  • Having code generation done by a crate that is outside the "normal" dependency
  • Unlike build scripts, the generated code can be placed inside the src/ folder
  • Unlike proc-macros, we save the generated files to disk and commit them, so downstream users can read and debug the source code just like any other *.rs file
  • Unlike build scripts, we don't run the code generation step every time we build the crate and therefore don't slow down the build for downstream users, only when running the tests for our codegen crate
  • Similar to the previous point, unlike proc-macros, we don't "block" the build pipeline by requiring the code generation code to have been compiled and executed before we can start compiling the crate
  • The code can always be regenerated manually using cargo test --package codegen if you accidentally bork things, otherwise it'll be done as part of a normal cargo test --workspace and CI will enforce that the generated code is always up to date
6 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.