Small dependencies in rust? do or don't?


#21

Ya. But I was not referring to just left_pad but similar issues where in say one small crate pushes a breaking change that collapses a lot of other crates


#22

Note that there are many cases where the inline attr is not needed. Generic code is compiled by the crate that monomorphizes it.


#23

If I am not mistaken compiler will also be able to do inlining if lto=true is enabled, to use which in release build is probably a good idea for an end application.


#24

Before we go of on a highly technical (and highly interesting) tangent about compile time (hint: “reply as new topic” hides under the a post’s timestamp :wink: ), I’d like to refocus on the maintainability of smaller crates.

Since crates enforce visibility boundaries (pub), they can be used to encapsulate inner workings.
Also, having multiple crates helps in keeping an application more modular.
For example: the workspace docs page discusses splitting a CLI program into a re-usable library-crate with the functionality, and a wrapper crate that does the command line parsing, Input-Output, etc. resulting in a more modular application, with more reusable code for others.

Of course there is a lower limit somewhere between that and leftpad, and I’d love if people could provide some concrete examples of how they split up their programs into maintainable modules (also in non-rust, I’m hypothesising that the “seams” are less language dependant, and more an architecture choice.)

To start of: my most valuable rust application so far uses a third party crate (needletail) to iterate over a bioinformatics file. The processing is then done in our own crate.

Needletail is obviously very singular in its responsibility, but also quite tiny. It provides a single iterator for two (related) filetypes, which could probably be copy-pasted. Then again, once you see how much effort they put into zero-copy, it is obvious that it deserves a crate.

I’m guessing that there is far more potential for small crates extracted from larger codebases than many of us realise!


#25

Nobody has addressed the problem that I posted except to explain various reasons why that problem is insoluble and we’re going to have to trade off build times for various other things we want, such as runtime performance, encapsulation, and module distribution. However, I know for a fact that these tradeoffs are a false choice, because, every day, I use a build tool that’s solved this problem for C++, Python, Java, and other languages.

Crates work well as distribution mechanisms. They don’t currently work well as a dependency graph, because, as I explained above, they tend to generate unnecessarily large dependency graphs. If your development environment isn’t rate-limited by build times, then the issue doesn’t come up. If your heavily optimized C++ build is 1 hour, but the Rust version would be 3 hours, then it’ll be the dominating factor in your inevitable decision to stick with C++.

Now, don’t get me wrong. There might be work to get from here to there, such as needing to have better LTO. That’s different from “there is no way to achieve this goal”. Actually, these posts have made my opinion on Rust go from “hmm, I have concern about build times” to “I’m skeptical that Rust will ever be useful at my company,” because the attitude that fast build times are just nice-to-have is a worse problem than actually having slow builds. Slow builds can be solved with some code, but attitudes are much harder to change.

A lot of people ask themselves why Haskell isn’t more widely used in industry. Build times are one of the key reasons. If serious attention isn’t paid to build times in Rust, then Rustaceans will be asking themselves the same question, too.


#26

Serious attention is being paid to it. That’s why there are a ton of people working toward it.


#27

For example: sccache Where Mozilla is actively pouring in money to make it support firefox-scale builds for each check-in. (Bkth are recommended reading, Mozilla’s own Bazel plans are briefly mentioned!)

Also, cargo is being overhauled for better integration with build tools. Currently they’re working on making cargo emit a build plan for other build tools to consume/parallelize/speed-up.

Also, Bazel doesn’t “solve” the build time problem, it makes a granularity choice for chunks over the network. That is still a choice, just a different one than rustc is currently making (but as I’ve linked, distributed build support is being worked on!)


#28

What I’d like to throw into the discussion is the difference between a single crate and a single project. diesel for example consists of 8 crates, while it is single project (in this case its even a monorepo) and I’d count it as a single dependencies. Same thing goes for other core projects such as serde or tokio.

What I see as problematic are tiny but widely used projects (such as leftpad) while I’ve got nothing against many tiny packages (as in crates) in a bigger project. babel is a good example for that in the javascript world. Having one big project with different packages can help with both the compilation time problem and the truck number.


#29

“That lucky fellow might decide to factor mydep1 out of myserver and update the dependency on myserver to point to mydep1 instead. That’ll be easier said than done, though; lacking original context, it won’t be easy to spot that extern crate myserver is really just a dep on mydep1.”

If I understand correctly, that is the reason why in Javaland tools like Sotograph exist. Doing deep dependency inspection to allow for better refactoring or even provoke such.

I.e.: Perhaps the dependency problem is not only one of build management, so to be solved by build tools, but of doing micro architecture parallel to that.


#30

I definitely agree with this. I’ve written and published a number of small crates (usually because I’m writing something else and part of it seems useful as a standalone library) and I wind up spending a lot of time on overhead like configuring CI, polishing the docs and README, etc. I think we could do better here with some tooling that provides better project templates and gives you a checklist of things to do to ensure you’re providing all the bells and whistles that accompany a well-written crate. (A tool that could automate many of the checks on the libz blitz checklist would be awesome!)

This is something I definitely think is important to the long-term health of the crates.io ecosystem. @killercup has a good blurb in their Rust 2018 blog post. Most crates are inevitably going to need some changes in the future, even if they’re small, single-purpose libraries. Relying on a single person to be around in the future to do that work (even if it’s just merging PRs and publishing a new release) isn’t a great plan.

FYI, sccache handles this use case very well. It’s exactly what it was designed for! We use it in Firefox CI where we do hundreds or thousands of builds per day and it saves us lots of machine time. It’s not perfect for local Rust development yet, but if your scenario is effectively cargo clean; cargo build it will make that much faster. For CI, where you’re building the same exact crates over and over in a controlled environment, it works great.

Also, I don’t think this is really a solved problem in the C/C++ world. Tools like bazel are getting to the point where it might be possible to have nice build caching if you don’t have Google or Facebook’s resources, but I don’t think it’s a turnkey solution yet. Developers still rely heavily on tools like ccache and distcc, and the fact that you can dynamically link against prebuilt system libraries helps build times (although it makes distribution a nightmare).

I think the key factor to me would be: is this struct or trait going to be part of the public interface, or is it an implementation detail? If it’s public then sharing types from a common crate means that composing different crates will be easier. A great example of this is fallible-iterator which is pretty simple (you could certainly copy-paste the core bits of it!) but provides extra value if you’re using multiple crates together and they’re all using the same trait.


#31

I believe this must be done with great care. A semver bump in a dependency like that can split an ecosystem. The Rust API guidelines also suggest that the stabilization of a crate must be blocked on the stabilization of all of its public dependencies first. (see C-STABLE)


#32

To go back to the original question, many small crates vs fewer larger crates, consider the process of taking on a dependency, either directly, or as a transitive dependency of your project:

  • Is the license of the dependency compatible with your use case?
  • Is there a changelog? Can you easily find the relevant information in there, when upgrading?
  • Does the project support your platform and the Rust version you use? How do they test their releases? What is the project’s definition of “breaking change”?
  • What is the track record of fixing bugs? Is there a chance that you will need to maintain your own fork to work around a bug in the crates.io version?
  • Is there a stability promise? Can you expect to be able to upgrade to newer versions without too much effort?
  • Do you have the time to watch out for new releases and read announcements? Is there a place where you can find out about important announcements at all?

Evaluating a dependency takes time, and actually taking on the dependency will be require investing more time, over time. Having more crates makes this process worse.

Apart from the maintenance burden, overhead is higher for smaller crates:

  • Downloading one large file is faster than downloading many small files. Even in countries where reliable high-speed internet is the norm this matters (try working from a train or bus).
  • You need to spin up a compiler process to compile. This helps to parallelise the workload, but once you run out of cores, splitting crates further is not beneficial.
  • There are a few widely used crates that spend more bytes on license text than on code.

Some microlibraries could be in the “obviously no bugs” rather than “no obvious bugs” category. Unless you have some exotic requirements, it probably doesn’t matter which of the 28 versions of num_cpus you use. Would you be better off just copy-pasting the code into your project, rather than copying the Cargo.toml line? Perhaps such code would be better distributed in a different form? (Stack Overflow answer, Gist, page with snippets, etc.)

Obviously there is a trade off here. I have listed some of the downsides of small crates, but they have advantages too. There is no one rule to decide the right crate size. That being said, I do think many people underestimate the cost of small crates, especially because much of the cost is not as immediately obvious. (Upgrading is not painful until you actually do it. You aren’t paranoid until somebody else breaks your build. Etc.)


#33

Thank you for enumerating the social costs of dependencies! That looks like a pretty exhaustive list!
(Especially the licensing one, I’m currently working myself through a 300+ item list of transitive dependencies for my jobs java project. Damn you Grails framework and your diversely-licensed dependencies in older versions!)

I agree that for these social problems, no technical solution can exist.

I do believe that we, as community, can mitigate the factors somewhat. For example, the general opinion is that semver==good practice, and things like trust make providing comprehensive CI easier. Both factors at least shift the “default confidence” about libraries more in the positive direction.

However, these are all examples of the library authors doing upfront work, to ease the inspection process for library consumers.

This upfront investment needs some kind of social reward/motivation for the authors to keep it up. Even something as simple as “appreciation”, but even better, pull requests and co-maintainers to share the burden.

At some point, I believe a core ecosystem can grow, that is at such a quality level that people won’t even think about publishing a rust crate without semver, CI, readme, docs, etc. “That’s just how things are done”. That would mean that dependencies can acquire something like “acceptable until proven guilty”, instead of the “normal” default of “verify ALL the Things!”. (Misty eyed idealism, I know :wink: )

The trick for us as a community is to get to that threshold. In that respect, I am wildly enthusiastic about the libz blitz!


#34

I’ve started doing this naturally for pretty much every project I start nowadays. Something as simple as having that green travis badge on your readme and adding a #![deny(missing_docs)] to your crate doesn’t take that much effort and makes your project infinitely more accessible to others!

I know in my personal experience if I’m looking for a library and come across something which is undocumented or untested I’m not going to even bother looking any deeper… Interestingly, I’ve found that the average Rust crate tends to be much higher quality than the average Python or JS library. I guess that’s what happens when your language has a strong type system and you’ve got tools like cargo and rustdoc built in.