Make Cargo share dependencies for different projects?

I am wondering do we have a way to make cargo reuse the dependencies globally.
Currently it's really annoying that every crate downloads and compiles all the dependencies separately. (Correct if I am wrong) For example, libc is a crate that is widely used and most of the crate specify the version like libc >= "1.0" for instance. It sounds like we are able to use any version later than 1.0. So I assume its possible the build system keeps a copy of the latest build of crates globally and use them to build different projects.

And when each time the project is built, it the dependency specification doesn't limit the highest version and the global cache meets the version requirement, it should be able to use the global cache directly.

It's a little bit annoying even a tiny project just use a very long time to build the dependency.

I'm very curios if the similar feature already exists ? That sounds useful for me.

Don't quote me on this, but I think the various feature flags have something to do with why this is challenging. For example, if two projects, a and b depend on crate foo; The two project cannot share foo object files if a enables --features "bar" on foo, but b does not.

Extrapolate to an endless combination of features, conditional compilation, linking options, std differences between stable/beta/nightly, etc. and here we are.

I don't this this is even a problem.

Just did a quick grep on crate.io packages, it seems most of the crate uses exactly same version of the dependencies + similar configuration set.

For example, for the serde dependencies, it turns out like that.

    309 ">= 0.3.0","features" [""]
    356 "^1.0","features" ["derive"]
    368 "^1.0.0","features" []
   1039 "^1.0.2","features" []
   1538 "^0.9","features" []
   1846 "^1","features" []
   1920 "^0.8","features" []
   8952 "^1.0","features" []

And for libc

    472 "*","features" []
    480 "^0.2.21","features" []
    506 "^0.1","features" [""]
    711 "^0.1","features" []
   1004 "*","features" [""]
   8460 "^0.2","features" []

I don't think the feature set and toolchain is a problem, it can be treated as a part of "version" in the cache system, as well as the Rust edition, etc.

But one thing makes the original idea doesn't work, it seems most of the crate use exact version of a dependency. But I do think this makes a LRU cache very efficient in fact.

And the assumption is there is definitely a commonly used version + build configuration for certain crate. And when switching toolchain / compiler version, it will just use a different cache / invalidate the previous cache.

So there won't be exponential binaries in the cache and we don't need to build every possible configuration for sure.

End applications can also set various settings that apply to dependencies. It’s not that simple.

The solution is to encode every single option into a hash, and use that. Nobody has implemented it, that I’m aware of.

That’s already the case. So, if you globally override CARGO_TARGET dir to the shared folder, sharing dependencies will just work(one compilation per/features/profile flags combination), at the cost dumping all “end results” into the same folder. I’ve looked at that piece of Cargo recently, and it would actually be pretty simple to add “compile crates io deps to CARGO_TARGET_SHARED folder”. As in, there’s a bunch of additional code to write, but no changes to how Cargo works isn’t required.

3 Likes

Looked at Cargo code real quick, it seems cargo::core::compiler::context::Unit is the thing for the hash code.

/// All information needed to define a Unit.
///
/// A unit is an object that has enough information so that cargo knows how to build it.
/// For example, if your package has dependencies, then every dependency will be built as a library
/// unit. If your package is a library, then it will be built as a library unit as well, or if it
/// is a binary with `main.rs`, then a binary will be output. There are also separate unit types
/// for `test`ing and `check`ing, amongst others.
///
/// The unit also holds information about all possible metadata about the package in `pkg`.
///
/// A unit needs to know extra information in addition to the type and root source file. For
/// example, it needs to know the target architecture (OS, chip arch etc.) and it needs to know
/// whether you want a debug or release build. There is enough information in this struct to figure
/// all that out.
#[derive(Clone, Copy, Eq, PartialEq, Hash, Debug, PartialOrd, Ord)]
pub struct Unit<'a> {
    pub pkg: &'a Package,
    /// Information about the specific target to build, out of the possible targets in `pkg`. Not
    /// to be confused with *target-triple* (or *target architecture* ...), the target arch for a
    /// build.
    pub target: &'a Target,
    /// The profile contains information about *how* the build should be run, including debug
    /// level, etc.
    pub profile: Profile,
    /// Whether this compilation unit is for the host or target architecture.
    ///
    /// For example, when
    /// cross compiling and using a custom build script, the build script needs to be compiled for
    /// the host architecture so the host rustc can use it (when compiling to the target
    /// architecture).
    pub kind: Kind,
    /// The "mode" this unit is being compiled for.  See `CompileMode` for
    /// more details.
    pub mode: CompileMode,
}

And it seems doable to implement a cache.

I would love this feature, with one important caveat: There needs to be some way to garbage collect the globally shared installs when no project is using them anymore. So there needs to some sort of reference system that can track if projects using a build of dependency is still present.

Another issue I thought about, though, is that some crates depend on environment variables, which may change between builds. How would this system factor in these various "system" states that probably aren't encapsulated by the current hashing system? There would need to be some way for crates to add additional bits of information to the hash.

I am trying to adding this to cargo right now.

You mentioned the garbage collection issue, I am going to handle it with a LRU cache. Which means the shared binary cache has a limited size on disk and when the cache is full, it will drop the least recently used binary from the cache.

Also the environment variable seems handled by the build.rs. I believe the Unit hash is able to reflect build.rs output change as well.

2 Likes