"Weight" of a dependency

It’s common to say some packages are “heavy” or “light-weight”, but what does it actually mean?

I’d like to help judge weight of dependencies on crates.rs, but how to quantify and display that information?

I’m considering things such:

  • Measure compiled size of it as a dynamic library.
  • Measure compile time.
  • Count number of lines of code,
    • probably normalized by rustfmt,
    • with extra weight for inlined and monomorphised functions,
    • maybe weighted by cyclomatic complexity?
  • Look at size of compressed crate file.

Is any of them good enough? (there will be exceptions in every case, so which one has least bad exceptions?)

Could multiple values be used together? How to blend them and display that?

Any other ways to measure the weight?

1 Like

Another big one is what a crate itself depends on. For example, I’m developing on a target without a libc, and so a crate depending on it is out of the question. A more common example would be that a crate itself could be quite small, but pull in some massive dependencies.

On lines of code, assuming the average user isn’t that interested in ever maintaining that code (for an average dependency), to me it would matter a lot less than compiled size.

1 Like

The term is often used relative so not great to measure quantitatively.

Add to @IsaacWoods comment. Anything foreign (not rust) is considered to add weight; dependent on how much extra manual configuring the build system is needed.

Almost none of what you list is that important to me when evaluating whether to take on a dependency. Here is what I care about that could be construed as “weight”, in no particular order:

  1. How often are new releases with breaking changes made? More frequent releases means I have to spend more time keeping up with the dependency. Apply this to all public dependencies as well.
  2. What is their policy on minimum Rust version? A missing policy—or at the very least, a missing CI entry for a fixed stable Rust version—means a lot more work me in tracking down that information myself and potentially redoing that for each new release.
  3. How many other crates directly depend on the crate in question? If a lot of other people are using it, then that typically implies some level of maturity. It means that many people, not including myself, found this crate worthwhile to use. It also means that if there are problems with this crate, then there is a higher likelihood that they will have been found and debugged before I hit them.
  4. Who is responsible for maintaining the crate? What does their track record look like?
  5. What does the documentation look like? Is it comprehensive and contain examples? Or am I going to need to read the source code to understand it? Missing documentation means I need to spend more time learning how to use the crate.
  6. Does the crate depend on non-Rust libraries? If so, I might expect to spend time dealing with that (or passing it on to others).
  7. For the most part, apply the above to all transitive dependencies.

That’s all I can think of at the moment. All of the above are purposely hand wavy. Nothing is set in stone. Fundamentally, “weight” to me loosely translates to, “how much of a maintenance burden will I incur by using this crate?”, and the above list are heuristics by which I judge that. Ultimately, that’s what I care about at the end of the day, because I want to be careful with how I use my time. Namely, is the incurred maintenance burden outweighed by what the library does for me?

The best crates are the ones that incur zero or almost zero additional maintenance burden. I’m quite fond of using those. :slight_smile:

On other topics:

  • Binary size is something I basically don’t care about. I recognize that some people might, but it doesn’t matter for the things I work on.
  • I can’t say I’ve ever really cared about the number of lines of code in a dependency. It might come up during my evaluation process if it’s really gratuitous for the particular problem it’s solving, but more often than not, I find that I’ve underestimated something if the LoC is higher than I expected. Interestingly, LoC might cause me to do the exact opposite of what you might expect. Namely, if a crate has a very small number of lines of code, I might decide to just solve that problem myself. In other words, I tend to avoid “micro” dependencies because they don’t balance well with my particular set of preferences.
  • Compile times are definitely a factor. They don’t really impact maintenance time, but they certainly impact the time I need to wait until I can see my code run. I think if there were a way to characterize this signal, then it would probably be useful, but I must say that compile times thus far almost always lose. Namely, compile times are typically most severely impacted for large crates (including all transitive dependencies), and large crates tend to solve complex problems that I don’t want to solve myself. Thus, I wait.

Your first list is spot-on. These are useful things, and I’m already displaying some of them (number of frequent breaking changes) and planning to display others (like recursive dependency on -sys or libc). But these things are more of a maintenance burden.

With the weight I was thinking more of cases when you look at a “add_two_plus_two” crate and see that it depends on “analytical_engine” crate that is 350MB in size and has 250 of its own dependencies. Even if all of them were pure Rust, cross-platform compatible, and well-maintained, for some that’s still a red flag.

1 Like

If we are discussing crate non functional parameters, as a user I’d probably be more interested in knowing the impact on binary size and compilation time instead of these.

How size and number of dependencies translates to these is always an educated guess at best. If we could pull some statistics from travis builds or crater runs it would be awesome!

1 Like

If it’s at all a useful metric for you, I tend to weigh dependencies based on the total depth and breadth of the entire dependency tree. It is common to see what looks like a small nodeJS package (and I’ll pick on nodeJS a bit because I’ve seen this first hand in that ecosystem) with just a handful of dependencies itself. But when you npm install it, you’re suddenly waiting 2 minutes to download all the crap, and another 15 to compile a couple of C++ modules along the way. Even if the top-level dependency was a slim 1K SLoC, the total weight of the package can be monsterous.

For an in-depth look into why exactly this dependency sprawl is a problem, you only have to look at npm itself: https://github.com/npm/npm/issues/11283#issuecomment-175246823

Things are a little bit harder to swallow with Rust because a massive dependency tree means the first compile time can easily hit several minutes. And when you’re updating the nightly compiler every night, that means one full rebuild every night. It can be quite grueling.

So I guess a leightweight dependency is one with very few dependencies of its own, preferably exactly zero.

I generally don’t like to measure weight by lines of code, as it could be misleading.

Some possible metrics could be -

  • Direct/primary dependencies
  • Features a crate implements (difficult to measure)
  • Types it publicly exposes (structs, traits, etc)
  • How often commits / releases are made
  • Crates using it as a dependency

Yes, but there are counter-arguments:

  • if it has few dependencies, that might be because it’s simple and usable in isolation (which counts as light-weight), or it might be because it’s duplicating utility functionality itself that could be taken from other crates (which counts as heavy-weight at least in terms of maintenance burden)
  • if it has several dependencies, but they’re common ones I probably already have, then adding it doesn’t add much weight to my project (or many others that integrate with common parts of the ecosystem). But if it pulls in (say) rayon where I’m using crossbeam, then it’s heavier for me.

So, really, I want to see that graph, and perhaps have cargo show me a ‘what-if’ delta of the cost of adding it. Which perhaps means that some weighting and searching and scoring is best done within a workspace.

Aside from all of that, I just want to point out that zero cost abstractions is a pretty good criterion for light weight, if somewhat binary, and hasn’t really been listed so far. Being able to specifically find/filter for crates that offer that as a feature would be great.