Small dependencies in rust? do or don't?


#1

Continuing the discussion from HTTP status codes crate:

Which other problems? I’m genuinely curious!

Once you fix

  • compilation speed (actually even faster with more modules/crates
  • startup speed (not an issue, thanks to cross-module inclining)
  • dependency persistence (cargo.io doesn’t allow “hard” yanking)

What is left to be a problem?

There is, in my opinion, much in favour for small crates, especially for “core” types like http or RGB. In fact, it would be more annoying if there weren’t crates for that, and _every_one has to write their own conversions.


HTTP status codes crate
HTTP status codes crate
#2

One remaining problem may be dependency compatibility. If I use tool which exposes utility in it’s API, then I have to depend on not only tool but also utility in the Cargo.toml of my package, and it becomes my work to make sure I depend on the same version of utility as tool does.

Luckily, I think there is a good solution to that as well; tool should simply re-export utility (either just the structs / traits it actually uses or entirely, as tool::utility).


#3

I am not against small crates, but it should be “atomic enough” and “self sufficient”. By self sufficient, I don’t mean it shouldn’t have any other dependencies, but it should work standalone when installed (rather than depend on some functionality that may be in another crate.

In general, say I am looking for a crate that handles request inputs. I would expect it to handle all types of request inputs - including files. I wouldn’t expect it to do other stuff wrt the request and that’s fine. but I don’t want to install 2 separate crates - one for normal inputs and one for file handling. No thanks :stuck_out_tongue:


#4

Isn’t that also something that would be (partially/largely) cargo’s job?
I mean to find the versions that satisfy all bounds.
I do agree it puts some social bounds on those “shared” packages. They should update infrequently, and put even more effort into proper semantic versioning, API stability and communication about updates.
I do have the feeling that in Rust, both the community and the software tooling provides good examples to follow of how to do it right. (e.g. futures)

To me that sounds less like a problem with small crates, and more like a problem with inexperienced or actively dumb packagers.
I.e. more of a community education/social problem than an IT or tooling problem.
Again, the rust community seems to have good examples that others can follow, so that I trust that this problem is solvable (although I can definitely see how it would be effort to guide newcomers into the “rust way of doing things” as the community grows)


#5

One thing that I’ve found is that generally smaller crates are of lower quality. There’s a decent amount of overhead for a crate with CI, the readme, examples, tests, etc. That are a little bit reduced if you have the same amount of code in a single crate. Additionally crate discover ability is still a problem. For this reason I generally think that people should err slightly towards larger crates. Additionally with smaller crates, it’s harder to build a community around and their maintainer bus factor is quite low (generally 1).

I became a maintainer for nix this year and we recently had the discussion over what to do about ioctls, and we decided to only provide the framework for people to make their own (a suite of helper macros) and cut all the actual implementations out for other crates to handle. This is because it’s hard to test these in CI and leverage the community while the other crates using them will likely be able to appropriately test them.

So in the end I think it’s a nuanced problem, tho with how small the Rust community is and how important that can be to having crates survive, I’m a fan of erring towards larger crates that can garner a maintenance/developer community with effective testing and docs and is easier to discover rather than a swath of smaller crates. That being said, Dr. Seuss had it correct when he said “a good crate is a good crate no matter it’s size” and I think avoiding a crate just because of it’s size an error unless compile times become problematic.


#6

My impression is that smaller crates are of higher quality — they’re easier to unit test, since there’s less to test and fewer layers of abstraction.

Note that it’s possible to have monorepos within a larger project that is split into multiple crates. So you can organize project/community around larger bit of code, but still publish it piece-by-piece for others to use only minimum they want.


#7

When would you suggest doing that over using features as part of a larger crate? If they’re that closely related that I’d want them in the same repository I would think it would be easier for contributes to have a single larger crate separated by features. And also for users as features are commonly used where many are on by default and you disable features rather than enable them.

But at the end of the day I don’t think there’s a correct answer here. As I said in my previous post, if a crate ticks all the boxes then use it!


#8

The “bus factor” is a real problem indeed, but I’m not sure if it’s related to the crate size. The url crate has 4 owners, but hyper has one. winapi has one, but log has over 3.

I think in general it’s very hard to go beyond bus factor 1, and that’s not specific to Rust. Projects like openssl and libjpeg had mostly one maintainer for years. Even in JS land it happens that major projects are mostly authored by one person: https://github.com/vuejs/vue/graphs/contributors

However, what you can easier get in open source is drive-by contributors who just want to fix a thing. For that I’d expect smaller crates to have an advantage, since it’s easier to understand all of the code, and small crates aren’t overwhelming like major projects can be.


#9

One thing I didn’t mention is that as a user I’m much more likely to prefer a larger crate when I’m implementing something because a) I’m often not completely sure of the scope of my work and b) I’m not certain how well the small crates will interact. I’ve been bitten by this before where I started to implement new functionality, realized the previous crate didn’t implement what I needed and didn’t integrate well with other crates so I had to scrap it in favor of another crate. It’s frustrating to do that. So I prefer larger crates with the most functionality whenever possible.


#10

Thanks for all the input (so far) everyone!

One trend I am beginning to see is a split in supporters between “big, do-everything” crates and “small, composable” crates, with no middle ground.

I seem to find myself agreeing with both camps, depending on use-case, perhaps “scale”, of what we are talking about.

For functionality, I want bigger, “do-everything” crates, a single framework that provides a comprehensive feature set. Think hyper or diesel. I wouldn’t want to compose my entire web middleware out of a dozen pieces that only partially work together. A coherent framework makes sense; either hyper or Rocket, not a mix of both.

Then again, for core types, I want tiny, practically struct-only typescrates, that all frameworks use. Such as http status codes, or RGB colours, or DMX/MIDI network packets, etc. (Which are then re-used by the different “do-everything” frameworks)

This ensures that if people write extensions or plugins on top of the frameworks, their code is interfaceable with other code, without writing any glue-code.

Am I the only one desiring such a tiny-types-vs-big-functionality split?


#11

@juleskers interesting topic.

My 2¢: I don’t actually think of a “right-sized” crate as big or small, I think of a crate as being right-sized if it adheres to the single-responsibility principle.

With that said, the scope of that single responsibility will vary (depending on the situation) from large to small or anywhere in-between.

If the responsibility of the crate is say, ORM, then a crate with a large-scope single-responsibility like Diesel makes a lot of sense. But, if Diesel were instead just a random grab-bag of serialization and database utilities, neither designed for nor sufficient for accomplishing ORM, then I’d suggest it should be broken up, as it would be violating SRP.

I think there are plenty of mid-sized examples (something like the num crate comes to mind, as does itertools).

Just another thought to add to the mix. :slight_smile:


#12

True. That’s why I pointed out earlier as well. People need to factor in that multiple crates might not play well together either now or at some point in future.


#13

From a practical viewpoint, I would’ve recommended smaller crates for compile performance, however thanks to incremental compilation (for compile times) and LTO (for runtime performance) there is no downside to using larger crates.

The most important part is that a crate should be big enough to provide the features it was designed for (after all, what was the point of building that crate?), while being small enough to be manageable by the maintainer (it is easier IMO to build a crate out of smaller crates than to create a big crate out of nothing).


#14

My $.2: My rule of thumb for this is: If the package/crate/library/whatever’s source code size approaches or is even smaller than the combined size of its auxiliary data, such as readme, build system config, etc., then it’s too small. (Ie. this is a kind of a SNR metric.)

The trouble with small dependencies is that they make project’s dependency list harder to verify and understand. Furthermore, when people are very liberal about dependencies, dependency trees get very large and thereby the number points of possible confusion or failure increases exponentially.

edit: Another rule of thumb I think is useful: If the package’s source code is small enough that copypasting it might work well, it is probably too small or close to being too small.

The HTTP status code crate is still ok IMO, although it’s approaching the lower bound of what I’d consider acceptable. I wish people won’t make crates smaller than that :slight_smile:


#15

A reason why it is better to have lesser dependencies as possible: https://github.com/npm/registry/issues/255

It is not that having lesser deps will solve the issue, but at least there is a lower chance it can affect your package.


#16

You can’t remove crates from crates.io.


#17

True. Thankfully.


#18

The way that crates work as the smallest unit of compilation is very concerning to me. My experience is colored by working on large software at large software companies.

A key question in my mind is - does the incremental compilation feature do anything for cold builds?

It’s worth noting that cold builds are very important. At $work I end up doing one or two cold builds per day, usually, even though I painstakingly try to avoid them. That’s not counting my CI system, which cold builds my branch every time it kicks off a regression test. A lot of my days end with me refreshing the CI gui, waiting for a regression test that I kicked off four hours prior to finally come home so that I can submit. Cold build time sensitively tracks developer productivity at my company, so much so that there are many team that are primarily dedicated to reducing the build times of our worst-offending binaries.


Say I have this crate, myserver, with an extern crate lib_A.

  lib_A  <+
        |
        |
     +----------------- myserver ---------------------+
     |  |                                             |
     |  |      +----------> mydep1                    |
     |  |      |                                      |
     |  |      |                                      |
     |  |      |                                      |
     |  +      +                                      |
     |   myserver                                     |
     |                                                |
     +------------------------------------------------+

And I want to add myclient which needs code in mydep1.

   lib_A  <+
        |
        |
     +----------------- myserver ---------------------+
     |  |                                             |
     |  |      +----------> mydep1 <----------+       |
     |  |      |                              |       |
     |  |      |                              |       |
     |  |      |                              |       |
     |  +      +                              +       |
     |   myserver                           myclient  |
     |                                                |
     +------------------------------------------------+

Which I happen to know in advance should go into its own crate. (It’s not always obvious ahead of time).
My understanding of Rust’s compilation model, and admittedly I’ve only spent a few hours grazing at the surface, is that, even with incremental compilation turned on, if I add myclient to the existing crate, the first time that I try to compile myclient, it’s also going to compile myserver and lib_A, even though myclient doesn’t use them at all. (I mean, there’s no way to even tell cargo just to compile myclient, is there? With cargo I can only address crates). Please correct me if I’m wrong?

My goal is to spend 0 seconds compiling or thinking about compiling myserver and lib_A whenever I just need to compile myclient. The obvious thing to do is to cargo new myclient and pull in mydep1. But wait - how? mydep1 is stuck in this other crate, which is the smallest addressable unit that I can depend on. Now if I want to actually reap the benefits of smaller dependencies, I’ll have to make mydep1 its own crate.

But myserver is a crate that is already in production and subject to a release process. It’ll take me at least two weeks to patch that crate into two new crates. Meantime, I’ll have an obnoxious integration task to keep all of the changes that are being pushed into the 1-crate system synced with the new 2-crate file layout. Also, by the way, I don’t want to block my check-in for two extra weeks. So, what am I going to do? If the code I need from mydep1 is pub, then I’ll likely create a new crate, and extern crate myserver. If it isn’t, then, time permitting, I’ll modify it to be pub, and then create a new crate and depend on myserver. If time isn’t permitting, I’ll just add it to myserver. In either case, cold build times will be worse than necessary. This cycle will continue until things get so bad that someone is given resources to actually break up the giant clump. That lucky fellow might decide to factor mydep1 out of myserver and update the dependency on myserver to point to mydep1 instead. That’ll be easier said than done, though; lacking original context, it won’t be easy to spot that extern crate myserver is really just a dep on mydep1.

How could this have been addressed? The best way is pre-emptively. If we had a “pit of success” instead of a pit of failure, our build system would default us into declaring compilation units at the most granular level that is technically possible. mydep1 would have already been its own build target for me to depend on, and I would just would have defaulted into making myclient its own target, without needing to think about it, and exposing smaller targets to downstream developers, this time compounding a virtuous cycle.

I’m pretty sure that if Rust were ever to gain currency at my company, we would have to have a strange policy like “one crate per source file” just to have a hope of feasible build times.

The key principle is that the idiomatic best practice should be to declare compilation units at the smallest granularity that is technically possible.

In this thread, I see two sides talking past each other. One side is talking about the appropriate size of what to ship to crates.io, and the other side is talking about compilation times. It is a sin of Cargo that these two separate concerns are conflated! The best thing to do would be to declare minimal compilation units and then, for shipping to crates.io, declare a few roll-up modules that you expose and document.


#19

This reminds me a lot about leftpad and the often-repeated Go proverb, “A little copying is better than a little dependency”. Which is a good idea, within reason.

That said, I think it’s more in line with the single responsibility idea than lines of code (i.e. “do one thing, and do it well”). “Small” crates like regex, http or base64 are incredibly important for the ecosystem because they give you a set of good quality building blocks which you can build things on top of, instead of having to re-invent the wheel every time.


#20

To expand on BurntSushi’s argument: the esteemed edunham has written a more thorough explanation:
edunham.net: could rust have a leftpad incident?
(the answer is, obviously, “no”)

I’m no expert on this, but I believe Rust’s decision stems from inlining/optimisation considerations. The compiler will only consider inlining:

  • within a crate
  • across crates, IFF the function is annotated with #[inline]

It is impossible to (completely) extract the two; build time depends, among others, a lot on optimisations, and optimisations work better if they have a larger chunk of code to work with; they need overview to do their best.
In that sense, crates definitely fit the “technically possible” criterion. Build parallelism vs. optimisation is always going to be a trade-off…

Crates are also the level at which visibility boundaries are enforced (think public/private in java).
This means that it is usually recommended to split rust projects into multiple functional sub-crates, much like you would chop java projects into packages, and divide C-projects into multiple libraries/header files. Where to chop, as always, depends on the natural “seams” of each individual project.

Cargo supports multi-crate projects natively, using ‘workspaces’ (‘new’ since july 21st, 2016).