Binary rlib dependencies / isolated dependency builds

Hi!

I have two crates A and B. Crate B depends on crate A. I want to compile crate B, but in such a way that procmacros and build scripts in A (and its dependencies) have no access to the source code of B.

Naively, you could try something like

$ ( cd a/; cargo build )
$ ( cd b/; cargo build )

relying on the fact that Cargo reuses build artifacts. Unfortunately, feature unification means that this wouldn’t work in most cases: if A depends on C with feature X and B depends on C with feature Y, then A built as a dependency of B will depend on C with both features.

I tried to determine the complete set of features by parsing cargo tree or cargo metadata, but both these sources do not contain enough information: cargo tree unifies dependencies of procmacros with normal dependencies (which is not what resolver = "2" does), and cargo metadata doesn’t seem to annotate features properly: it unifies features needed across every crate in workspace.

I’m not sure how to solve this problem properly. Any input would be appreciated.

They formally don't — that is, if they're doing it then they are reaching outside the Cargo compilation model. Dependents should only affect dependencies via the feature selection mechanism.

Is your goal sandboxing — preventing rule-breaking code from succeeding at breaking the rules? Or are you observing some unexpected interactions and wanting to make the build relevantly deterministic? More information will help us tell whether the thing you want is possible and how to accomplish it.

It might only be possible by modifying (or entirely replacing) cargo, but perhaps not.

Is your goal sandboxing — preventing rule-breaking code from succeeding at breaking the rules? Or are you observing some unexpected interactions and wanting to make the build relevantly deterministic? More information will help us tell whether the thing you want is possible and how to accomplish it.

Yes, I’m trying to sandbox untrusted code so it can’t access the source of crate B during compilation.

In that case you're in a lot of trouble, because Rust and Cargo fully trusts the code it compiles and gives it countless ways to execute arbitrary code. All isolation that Cargo provides is only for well-behaved cooperative code. Malicious actors can easily bypass all of it.

You will have to sandbox the entire machine including all of Cargo, and treat all code compiled on it as tainted and potentially malicious.

There is an attempt by PL/Rust to eliminate loopholes that allow arbitrary code execution during compilation, but I have doubts whether that is enough to actually sandbox it, because Rust and lower-level C infrastructure it uses were never designed to withstand this type of attack:

https://tcdi.github.io/plrust/#general-safety-by-rust

Sandboxing the machine running the compiler is half the problem (and is quite easily done). The problem is that (with current Cargo architecture) my “valuable” code must be present on the sandboxed machine when untrusted code is compiling.

If the dependency is a leaf dep, and the only thing it needs from your valuable code is unification of dependencies, then you could replace your source code with an empty src/lib.rs file, keeping Cargo.toml and Cargo.lock. This way you can compile deps' .rlib in isolation, and later link it with the rest of the precious project. You will still need to ensure the code of the dependency doesn't do anything tricky, like replacing symbols it shouldn't touch or injecting linker flags, running build.rs, proc-macros, etc.

Alternatively, compile the dependency to WASM/WASI and run it via sandboxed interpreter, exposing API for your precious code in a very controlled way. This will be the most secure option.

1 Like

Valuable code is a big workspace of crates, but replacing all of them with dummy file seems possible, thanks. I’ll check whether this prevents recompilation. Runtime behaviour of the untrusted code is less worrisome, the primary thing we want to protect is the source code.

WASM (or just dynamic linking) is obviously the safest option, but it requires making the API surface extern "C", which is not feasible at the moment.

If crate b depends on crate a and crate a uses a proc macro, this proc macro will be able to access the source code of crate b even if it is never used by crate b. The moment you pull in crate a, rustc will load the proc macro which can run a global constructor that has code accessing the source code of crate b.

Could you airgap the machine doing the builds? Or maybe just build inside a Docker container that you've set up to only allow access to (for example) crates.io?

If you are trying to protect your intellectual property, it doesn't matter if you have a malicious dependency because they won't be able to exfiltrate your valuable project's source code.

1 Like

Yeah, so the goal is to make sure that when rustc is called for A, source code for B is not accessible.

Yeah, that’s a good idea, thanks. A can try embedding the source code into the binary and then sending it in runtime though,