So I have a project that has grown unruly, and have been breaking it down into microcrates for a number of reasons (described below).
To be clear, I've got the cargo configuration down to a tee. That's not my issue. (in fact, I have my own homebrewed system for autogenerating Cargo.tomls, just to get around the pain of maintaining all of the relative path
deps and to ensure that all my crates agree on version numbers for dependencies in cases where public deps are unavoidable)
The real trouble is that, well... I just really don't know how I should break things up!
- There are a small number of factors which very strongly motivate me to break something up;
- Code that could potentially be useful outside this project shouldn't be using types and traits that are strongly entrenched into the project (like an Error type with conversions from 10 external crates).
- The desire to keep clean, pure code away from all the dirty, high-level code that needs to get stuff done
- to contain the spread of copyleft licenses so that large portions of my code may remain permissively licensed.
- I want RLS to remain reasonably responsive.
- ...but then in term, new factors arise which force my hand into breaking things up even further:
- Crate deps must form a DAG, hence things must be broken up further for more code reuse.
- Eventually I need crates even for very private things that no code outside my project should ever depend on. The distinction between
pub
andpub(crate)
quite nearly becomes a joke.
- Eventually I need crates even for very private things that no code outside my project should ever depend on. The distinction between
- Coherence problems arise from having separate crates
- Crate deps must form a DAG, hence things must be broken up further for more code reuse.
Right now I have... (oh geeze, is that number right?!)... uh, 15kloc spread out over 18 crates, and I only see that number increasing in the future.
I know asking for general advice probably won't end too well, so here's some specific problems I have:
Problem 1: Structure types and algorithms
Currently I have a crate with two very distinct types of code in it:
Core data structures:
// DRAMATIZATION
/// A crystallographic structure.
///
/// At bare minimum, it pairs Coords with a Lattice so that you can do
/// operations like converting between frac/cart and reducing positions
/// into the unit cell.
///
/// It also possibly contains metadata, which *might* be stuff like Elemental
/// symbols, which are permuted and replicated alongside the positions.
pub struct Structure<M=()> {
lattice: Lattice,
coords: Coords,
// NOTE TO SELF: Please stop trying to figure out how this
// can be generically turned into an SoA.
metadata: Vec<M>,
}
/// A bravais lattice.
///
/// The matrix inverse is precomputed, for great justice.
pub struct Lattice {
mat: Rc<[[f64; 3]; 3]>,
inv: Rc<[[f64; 3]; 3]>,
}
/// A set of unique coordinates which may or may not be expressed
/// in units of some bravais lattice.
pub enum Coords {
Carts(pub Vec<[f64; 3]>),
Fracs(pub Vec<[f64; 3]>),
}
...and algorithms. Things like:
- Constructing a supercell
- Identifying layers or clusters of atoms.
- Finding a more primitive representation of a structure.
- Identifying its symmetry operations
I feel like these should be split apart, and in particular some of the tougher algorithms could be useful to other people. But:
- some of these algorithms are such that I would like them to be available as methods on Structure; hence the types crate would want to depend on the algorithms crate.
- some of these algorithms benefit greatly from the utility functions on Structure (which is the most basic data structure I have that contains both Coords and a Lattice). Hence the algorithms would like to at least privately depend on the structure crate, maybe with some features disabled.
End result: I know I want these things separated... but I don't know what should depend on what!
Problem 2: Error types in helper traits
Yesterday I wrote a trait intended to be used in fairly high-level code that handles a lot of orchestration of tasks and needs to do a lot of filesystem manipulation. I put it directly in the crate where I intended to use it the most:
pub trait Source<T> {
// Note: Result is the crate's error_chain Result.
// (each crate in my project has its own error_chain!{};
// if they shared the error types then every single crate would transitively
// depend on e.g. serde_yaml, which is just silly)
fn get(&mut self) -> Result<Arc<T>>;
fn get_cached(&self) -> Option<Arc<T>>;
fn ensure_cached(&mut self) -> Result<()>;
}
// let tuples act as a product type combinator to help cut down on boilerplate
// in impls for composite types
// (I generated this for n-ary tuples via a macro)
impl<A1, S1, A2, S2> Source<(Arc<A1>, Arc<A2>)> for (S1, S2)
where
S1: Source<A1>,
S2: Source<A2>,
{ ... }
But then I decided I wanted to be able to use this trait in another crate lower down the dep tree which itself has some amount of high-level code. So I broke it out. But hm, shoot. What error type do I use? Guess I need to add it as an associated type.
pub trait Source<T> {
type Error;
// Note: Result is the crate's error_chain Result
fn get(&mut self) -> Result<Arc<T>, Error>;
fn get_cached(&self) -> Option<Arc<T>>;
fn ensure_cached(&mut self) -> Result<(), Error>;
}
At which point I realize, oh no! What error type do I use in the tuple impl now!? There's nothing that I can reasonably expect that these two crates' error types will be able to convert into; least of all anything that is capable of being imported by this new utility crate (now that it sits underneath them in the dep tree)!
I considered porting my codebase over to failure
(after which I could use failure::Error
as the error type) but that's a pretty big yak to shave for a weekend refactor, and I have no idea what other issues I would run into.
So... yeah. I'm not sure how much anybody even can help me, but... trying to do microcrates has been a frequent source of grief, and I just don't have very good strategies for dealing with the problems that arise from it.