Strategies for managing microcrate heirarchies


#1

So I have a project that has grown unruly, and have been breaking it down into microcrates for a number of reasons (described below).

To be clear, I’ve got the cargo configuration down to a tee. That’s not my issue. (in fact, I have my own homebrewed system for autogenerating Cargo.tomls, just to get around the pain of maintaining all of the relative path deps and to ensure that all my crates agree on version numbers for dependencies in cases where public deps are unavoidable)

The real trouble is that, well… I just really don’t know how I should break things up!

  • There are a small number of factors which very strongly motivate me to break something up;
    • Code that could potentially be useful outside this project shouldn’t be using types and traits that are strongly entrenched into the project (like an Error type with conversions from 10 external crates).
    • The desire to keep clean, pure code away from all the dirty, high-level code that needs to get stuff done
    • to contain the spread of copyleft licenses so that large portions of my code may remain permissively licensed.
    • I want RLS to remain reasonably responsive.
  • …but then in term, new factors arise which force my hand into breaking things up even further:
    • Crate deps must form a DAG, hence things must be broken up further for more code reuse.
      • Eventually I need crates even for very private things that no code outside my project should ever depend on. The distinction between pub and pub(crate) quite nearly becomes a joke.
    • Coherence problems arise from having separate crates

Right now I have… (oh geeze, is that number right?!)… uh, 15kloc spread out over 18 crates, and I only see that number increasing in the future.

I know asking for general advice probably won’t end too well, so here’s some specific problems I have:


Problem 1: Structure types and algorithms

Currently I have a crate with two very distinct types of code in it:

Core data structures:

// DRAMATIZATION

/// A crystallographic structure.
///
/// At bare minimum, it pairs Coords with a Lattice so that you can do
/// operations like converting between frac/cart and reducing positions
/// into the unit cell.
///
/// It also possibly contains metadata, which *might* be stuff like Elemental
/// symbols, which are permuted and replicated alongside the positions.
pub struct Structure<M=()> {
    lattice: Lattice,
    coords: Coords,
    // NOTE TO SELF: Please stop trying to figure out how this
    //               can be generically turned into an SoA.
    metadata: Vec<M>,
}

/// A bravais lattice.
///
/// The matrix inverse is precomputed, for great justice.
pub struct Lattice {
    mat: Rc<[[f64; 3]; 3]>,
    inv: Rc<[[f64; 3]; 3]>,
}

/// A set of unique coordinates which may or may not be expressed
/// in units of some bravais lattice.
pub enum Coords {
    Carts(pub Vec<[f64; 3]>),
    Fracs(pub Vec<[f64; 3]>),
}

…and algorithms. Things like:

  • Constructing a supercell
  • Identifying layers or clusters of atoms.
  • Finding a more primitive representation of a structure.
  • Identifying its symmetry operations

I feel like these should be split apart, and in particular some of the tougher algorithms could be useful to other people. But:

  • some of these algorithms are such that I would like them to be available as methods on Structure; hence the types crate would want to depend on the algorithms crate.
  • some of these algorithms benefit greatly from the utility functions on Structure (which is the most basic data structure I have that contains both Coords and a Lattice). Hence the algorithms would like to at least privately depend on the structure crate, maybe with some features disabled.

End result: I know I want these things separated… but I don’t know what should depend on what!

Problem 2: Error types in helper traits

Yesterday I wrote a trait intended to be used in fairly high-level code that handles a lot of orchestration of tasks and needs to do a lot of filesystem manipulation. I put it directly in the crate where I intended to use it the most:

pub trait Source<T> {
    // Note: Result is the crate's error_chain Result.

    // (each crate in my project has its own error_chain!{};
    //  if they shared the error types then every single crate would transitively
    //  depend on e.g. serde_yaml, which is just silly)
    fn get(&mut self) -> Result<Arc<T>>;
    fn get_cached(&self) -> Option<Arc<T>>;
    fn ensure_cached(&mut self) -> Result<()>;
}

// let tuples act as a product type combinator to help cut down on boilerplate
//  in impls for composite types
// (I generated this for n-ary tuples via a macro)
impl<A1, S1, A2, S2> Source<(Arc<A1>, Arc<A2>)> for (S1, S2)
where
    S1: Source<A1>,
    S2: Source<A2>,
{ ... }

But then I decided I wanted to be able to use this trait in another crate lower down the dep tree which itself has some amount of high-level code. So I broke it out. But hm, shoot. What error type do I use? Guess I need to add it as an associated type.

pub trait Source<T> {
    type Error;
    // Note: Result is the crate's error_chain Result
    fn get(&mut self) -> Result<Arc<T>, Error>;
    fn get_cached(&self) -> Option<Arc<T>>;
    fn ensure_cached(&mut self) -> Result<(), Error>;
}

At which point I realize, oh no! What error type do I use in the tuple impl now!? There’s nothing that I can reasonably expect that these two crates’ error types will be able to convert into; least of all anything that is capable of being imported by this new utility crate (now that it sits underneath them in the dep tree)!

I considered porting my codebase over to failure (after which I could use failure::Error as the error type) but that’s a pretty big yak to shave for a weekend refactor, and I have no idea what other issues I would run into.


So… yeah. I’m not sure how much anybody even can help me, but… trying to do microcrates has been a frequent source of grief, and I just don’t have very good strategies for dealing with the problems that arise from it.


#2

It sounds like you need a 3rd crate (:crazy_face:) that would define types that your algorithms crate needs (in lieu of using helpers/utility functions on types directly) and that your types crate can implement (and depend on). So then your types crate depends on algorithms and this 3rd crate, and algorithms depends on the 3rd crate. Not sure if that’s better but seems like the cleanest way to achieve this particular aspect.


#3

At work I’ve got a similar issue (juggling a decent amount of code and making sure it stays cohesive) but I chose a different way of going about things. Instead of lots of little crates I’ve got two fairly large crates each containing about 50% of the Rust “module”. This then gets linked into a larger application (~200kloc) using FFI and DLLs.

I was lucky in that my problem domain naturally lends itself to a monolithic design with a main “business logic” crate and a side “serialization/file format” crate. I’ve found that this monolithic-style design reduces the issues you have when there are lots of different crates which all have their own error types, as well as other “fun” coherence/dependency problems.

I actually started off using a similar design to you, where each logical chunk of code would be broken out into its own crate. It made a massive difference on compile times (my monolithic design takes about 5 minutes to do a full debug build and 10+ for release), but keeping the crate hierarchy and coherence issues sane ended up being more trouble than it’s worth. As the Go proverb goes, sometimes a little copying is better than a little dependency.


Hmm… Just out of curiosity, do they need to be separated? I mean, I’m as big a fan of code sharing and DRY as everyone else, but if it’s going to consume a large amount of designer time/effort to keep things separate, while the actual implementation is so specialised to your use case others may not be able to reuse the code in practice, then… It may not be worth it.

Alternatively, one solution I can think of is to use dependency inversion. You’d define a general trait and do all your algorithms in terms of that, then higher up the dependency tree someone would create a concrete type which satisfies that trait and can use the things already defined by the algorithms crate.

I know the standard library has encountered similar issues and used DI to get around them. One example which comes to mind is liballoc where they’ve defined a generic Alloc trait that all collections use, but the actual implementation is left to something like jemalloc or libc later on (technically they do this via linker magic, but it’s the same idea).

The failure crate does sound like a pretty big hammer to use (and migrating isn’t exactly trivial), but it sounds like a good solution in the long term. failure::Error gives you one common type that everyone can pass around when they hit an error, while also allowing you to downcast to a specific Fail type if you want to handle certain errors or provide customised error messages.

The issue I have with std::error::Error is that it’s too lightweight (all I can do with it is print it to the user) and type erasure means you lose the ability to behave differently depending on the actual error encountered. Similarly, error-chain is awesome but I’m not a fan of how you’ll end up wrapping every other error in the dependency tree.

Phew! That turned out to be a longer response than expected, but I guess it goes to show that the problem isn’t an easy one :disappointed: