Rust compile times and dependency graphs

I was just reading Rust Compile Times and Code Graphs | Hacker News Compile Times and Code Graphs

and in particular:

For each unit of business logic foo, separate crates for:

  • Types: for Plain Old Data, protobuf, traits that users of foo implement, etc.
  • Interface: for the public API without an implementation. 4sq called this FooService, mz calls it foo-client.
  • Implementation: for the implementation of the public API. 4sq called this FooConcrete, mz calls it foo.
  • Note that not every foo will have all three of these, and some will be more complicated, but I’ve found these three to be a reasonable default.

Now, I like this idea. I have one problem: In Rust, aren't all impl blocks required to be either (1) in same crate as where struct/enum is declared or (2) in same crate as where trait is declared ?

If so, how is it possible to separated out the "implementation" crate from the "types" or "interface" crates ?

Concretely, if:

crate foo_types: pub struct Foo

crate foo_traits: pub trait Bar_T

doesn't Rust compiler require that:

  1. "impl Foo" be in "foo_types" and

  2. "impl Bar_T for Foo" to be in either foo_types or foo_traits ?

If so, in the article, how are they separating out the implementation crate (and what is in it?)

Thanks!

1 Like

It sounds like they've put the types and traits into "types". So in this case, "interface" would be joining many types and traits together into higher-level interfaces. And "implementation" would probably be instantiating those interfaces with other types, I/O, and procedural logic.

You can separate your types and traits with features, though. You start with a bunch of crates that are only traits. Then in crates with types, make them optional dependencies and put the impls behind the corresponding features. That way, things that use the types only depend on the trait crates that are in use. Public crates often do this with serde.

This is believable. Call this crate foo_types

I'm not sure why this is separate, but not my main objection.

I don't see how this works. How can you impl stuff defined in foo_types in a crate outside of foo_types, given both structs & traits are defnied in foo_types ?

Is it defined behavior to include the SAME crates MULTIPLE TIMES, but with DIFFERENT FEATURE FLAGS ?

This seems like the type of thing that causes duplicate definitions/ linking errors -- but if I'm wrong, I'd love to learn more.

Cargo will just compile it once with the sum of the enabled features.

This top level crate would just do things like

let mut foo = foo_types::Foo::new();
foo.run(foo_types::Runner);
foo.save_to_file("file.txt");

Where is the compile time savings then ?

Any crate that includes foo_types would be forced to include foo_types with all the union of features, and in the model you described above, would include the impl code.

Say you have a crate with two features:

[package]
name = "a"

[features]
x = []
y = []

And there's two crates that depend on it.

[package]
name = "b"

[dependencies]
a = { features = ["x"] }
[package]
name = "c"

[dependencies]
a = { features = ["y"] }

You can work on b without compiling the code in feature y, and you can work on c without compiling the code in feature x.

I don't think we have the same model in mind. I am envisioning a workspace with a root crate. When b gets modified, the root crate needs to get rebuilt, which by your logic above, requires compiling a with the union of all the features.

If you're building something that depends on b and c, changing one or both of b and c wouldn't require a to rebuild. It's already built with both features. And a build system (not plain cargo) can cache the different permutations of features of a so that you don't wipe out one by building another.

One thing you're stuck with is that changing feature y causes b to recompile.

I was referring to the above. In the setup of a global root, by your own logic above, a had to be compiled with the union of features, which negates the "without compiling feature x/y" statement above.

I'm confused. There's no way to get around compiling both x and y if you need those features. Are you worried about c and d taking longer because they have a bigger dependency?

I think we are both stating technically true statements, but unfortunately trying to solve different problems.

Let me take a step back. This is my version of the story; there's also your version of the story, and somewhere in between is the truth. :slight_smile:

  1. The original goal of the post is: is it possible to separate a crate into foo_types, foo_interface, foo_impl, as stated in that blog post I read.

  2. This is something the blog post claims is possible, but I could no figure out how to do (due to orphan rules and other constraints.)

  3. My goal here is to reduce overall compile time by having crates not depend on each other as much.

  4. At some point, you suggest idea of a single crate with multiple features. (I'm not sure if you are trying to solve the original problem or a separate problem.)

  5. I am arguing that crate features can not solve the split into foo_types, foo_interface, foo_impl to reduce compile time problem in the case of a giant workspace w/ a root package at the top.

The only benefit of the features thing is that check times are reduced, since you don't need to compile the root crate unless you change the API. It doesn't help with testing the root crate.

I went in and checked the commit linked in the article to see what they actually did.

The -types crates have types, and the inherent impl blocks as is required. They also make all the struct fields public, and implement some traits.

The -client crates define and implement many traits for those types. This takes advantage of the public fields. They also have tons more types. It looks like over time, many higher-level ("impl") crates started depending on these types without needing the traits, which triggered the refactor to pull some of them out.

The main change is that they moved many types from mz-storage-client to mz-storage-types. Same for mz-compute-client. This had the effect of moving mz-sql, as stated in the blog. It seems like they could've gone even farther and made mz-compute-client not depend on mz-storage-client, but maybe that wasn't possible.

They also moved one trait, TryIntoTimelyConfig, from the "implementation" layer to the -client layer. I don't think this is too important to the blog, just that it was in the wrong place.

So I think the takeaway is that you should have the bottom two layers of your dependency graphs be mainly reusable traits ("interface") and mainly reusable types, with the trait impls in the same crate as the traits. The "implementation" layer is just everything else and consists of many interdependent layers, with many types and traits. Maybe this layer is more important in other languages.

Separating one crate into these three layers won't do much on its own, but it allows other crates to decide which layer they actually depend on, which is what reduces compile times. Really, any separation will help when a crate is doing multiple things; this is just one way to ensure every crate has a logical way of being separated that achieves two things: it doesn't cause interdependence within each layer, and it puts the least frequently changed code at the bottom.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.