Generating custom output files with Cargo

fish shell uses gettext (and in future fluent) for storing translations.
To be able to check and auto-update translation files,
we use a proc-macro to extract the set of translatable strings.

It looks like the easiest way to do this is by having each macro invocation like localizable_string!("foo") write to a temporary file, and finally concat all files.
Writing to the same file would break parallel compilation.

Writing a new file for each translatable string literal seems fine,
but if there is a better way or if if anyone else does anything like this.
It looks like Cargo/rustc are only meant to produce object files, not custom text files.

Prior to the extraction macro, we used parse the output of cargo-expand with regex but that was quite brittle though (since cargo-expand doesn't give structured data).
Maybe there is a way to write a "compiler plugin".. I'm thinking something like clang-query that allows to collect all interesting identifiers inside specific function calls.

(For our use case, we might be able to get away with a different appproach of using the translation files themselves as source of truth, rather than the string literals in Rust code. But that's a bit worse)

For build scripts, they have an OUT_DIR that is an isolated directory for writing to without concerns for parallel compilation. We are designing a way to declare which of these files should be final artifacts (Allow build scripts to stage final artifacts · Issue #13663 · rust-lang/cargo · GitHub).

Interesting thought om how to do something similar for proc-macros

  • As mentioned, proc-macros can run in parallel and unsure of a good way to coordinate on that
  • We don't have a way for proc-macros to communicate back to cargo to say what files should be staged

Another approach would be to not treat these as outputs but either auto-updating source (run to overwrite, maybe a check mode for CI) or snapshot tests (check mode by default, require an env to overwrite). You'll still need individual files and though.

Ah so if we define the main crate (which contains the translatable strings) as build dependency for crates/extractor/build.rs, then we could compile the main crate with EXTRACT=$OUT_DIR/extracted set,
which would cause the proc macro to append to that file,
and crates/extractor/Cargo.toml would declare $OUT_DIR/extracted as output artifact.
That would make things slightly more convenient to us (no longer a need to inject an environment variable, a Cargo feature would be enough).

Treating the translation files as build inputs instead is an option.
That'd be sufficient for checking correctness of related Rust code,
but for checking whether things are fully consistent (i.e. there are no extra lines in the translation files), we need the extraction.

Either way, all of this is only for checks and translation updates (when strings are added, removed), so it's not really on the hot path of incremental compilation.
It would be nice to share caches, not sure if that's easily possible though.

I was considering suggesting using OUT_DIR within the build script but I think that directory should be treated as writeable from the build script itself and read only from anything else:

  • There can be multiple build targets that depend on a build script and they can build in parallel
  • We've been looking at changing our locking scheme for build artifacts which might delineate read and write operations and allow multiple cargo processes read concurrently (think r-a and your terminal)
  • As we work towards a shared build cache, we'll be assuming who is writing and reading to various directories.

Even if you do that, you still have the problem of appending to that file from concurrent proc macros within one compilation unit.

that would be nice, I sometimes stop rust-analyzer when it would block my cargo b && run-system-test.

you still have the problem of appending to that file from concurrent proc macros within one compilation unit.

I don't know much about how Rust compilation works but I guess each compilation unit is translated by a rustc process which spawns multiple concurrent threads that can expand proc macros. That could explain why using the same file is not safe.

I believe proc-macros are expanded in the frontend which is currently single threaded but there is unstable support for a parallel frontend which would benefit compile times.