On library "completeness"

A thought I had — there's a decent sentiment in the general software development that a project is never really “complete;”[1] at a minimum you need to keep up with dependency and platform churn to avoid the bitrot.

However, there's a significant difference here between source libraries and programs.[2]

A source library utilizes other libraries that can be updated separately; at a minimum the language toolchain can be updated. If a source library is written in a fully platform independent manner, then it remains useful even in the face of platform churn because the underlying per-platform functionality is being updated. For a program, however, this requires the program distribution to be updated to distribute a version rebuilt with all of the new fixes and improvements.

Big/popular software today typically gets somewhat frequent updates with little more description than “fixes and improvements,” contributing to the feeling that all software needs constant maintenance. But I'd be willing to bet that a significant[3] portion of such updates are absent of any actual changes by the team developing the program itself, but instead just shipping the “fixes and improvements” gained by updating their external dependencies. Even if all of their transitive dependency tree has wonderful changelogs, that's developer targeted, so there isn't really anything that they can say that would be actually relevant to the user who doesn't know or care[4] how the program is built.

Discussion question: what makes a Rust library crate “complete” in your eyes? Authorial intent? A lack of significant[5] issues over time? Stability? Does usage level have any impact in the analysis?


  1. People mostly seem to agree that a project can be “finished,” though; the difference as I understand it is that “complete” is more than just “finished.” Some people would use the terms in the other parity, but general consensus does seem to me that a project can be at least one of the two. ↩︎

  2. I'm including both a precompiled binary library object and executables as “programs” here for simplicity. Executables (as in something you would run an installer for) generally bundle any non-OS binary deps they use (is this Windows bias?), where binary libraries often haven't been linked the binary transitive dependencies yet, so the property I'm looking at is more obvious and prevalent in packaged executables, but it applies just the same to precompiled libraries, just on a somewhat smaller scale. Assuming their downstream actually updates the transitive dependencies and doesn't just use the copy bundled with the library distribution that were added to make things work out of the box. ↩︎

  3. Conveniently omitting a definition for “significant.” If I understand statistical significance correctly, that's basically just a measurable effect, so even a single such occurrence could be considered “significant,” since these are discrete events and not samples of a continuous domain. :wink: ↩︎

  4. You can argue that users should care more about the chain of providers that they are implicitly trusting, but the fact of the matter is that the vast majority of users don't, or consider the transitive trust derived from trusting the primary vendor to be sufficient. (Whether the primary vendor actually audits their web of trust sufficiently is yet another independent question.) ↩︎

  5. This one is definitely a fuzzy continuous scale where people will draw the “line” differently. ↩︎

1 Like

It kind of depends on the domain and scope if a library can be finished and complete. Something like a http server, tokio or similar will probably never be finished.

Something like castaway, once_cell etc could absolutely be done at some point. The key thing here is that the domain doesn't rapidly change and that the scope is limited. There is no analogue to "a new version of the Http standard".

2 Likes

I think it’s useful to distinguish different sorts of “complete” that are both relevant: there is “the work is completed; there are no further sub-tasks undone”, and there is “not missing parts”.

A library that, by its design choices or by happenstance, doesn't intrinsically need constant maintenance is able to become complete in the first sense. But there is also the question of: if you use this library, what are the chances that you will find a bug or lack of feature, such that you have to either hope the maintainers are up for publishing a new release, or fork or replace it? Libraries are more or less likely to have this problem depending on their design choices and the role they play in your program.

As to my own practices, I think that one of the best things to do when choosing libraries, and when writing libraries, is to minimize large or public dependencies. A public dependency, as defined by rustc and Cargo and hopefully someday lintable, is one whose types or traits appear in the public API of the dependent crate. These become points of “excessive coupling”; suppose you depend on foo v2.3.0 and bar v1.0.37, and foo v2.3.0 also publicly depends on bar v1.0.37. Then, if bar releases a v2.0.0, you may find that upgrading bar causes type mismatches because you need to pass bar's types to foo or vice versa, and foo is still using bar v1.0.37.

Even if there are no type mismatches (which may be because the dependency is not public, or because your use of bar is unrelated to your use of foo), this results in multiple versions in the dependency graph, which increases various costs and risks.


For a concrete example, when I published rendiff v0.1.0, it depended on image v0.24. After that, image came out with v0.25, which means that if I don't publish a new version of rendiff, all dependents have to bring in an old version of image and all of its relevant dependencies. To avoid that, I made rendiff v0.2 use the much lighter, though lesser used, imgref library. imgref has fewer features, but is therefore much less likely to need a bug-fix or fail to build. It's an additional burden on rendiff's dependents to need to adapt their image type to imgref::Img, but that is (in some but not all ways) a smaller and more flexible burden than requiring that they depend on a specific version of image.

Similarly, I could have chosen to forego imgref entirely and simply defined my own (width, height, &[pixels]) type for input, and that’s what I would have done if imgref didn’t exist.

rendiff is definitely not “feature complete”; there are things it should do that it doesn’t. But, I hope that by the above design choice about my dependencies, even if I never publish another release, it will remain a reasonable choice of dependency far longer than otherwise.

It’s improbable to be able to write software that is totally free of “churn” to keep it usable and maintainable, except in narrow domains. But, you can reduce the probability of churn events, for your software and your software’s dependents, by thoughtful choices. Dependency graphs having this property will tend to be shallower and wider.

5 Likes

The notion of a “complete” library is a mirage.

Imagine the perfect library: it has exactly one function, main(). You call it, reality rearranges itself, and your problem is solved. You lean back, satisfied… until you realize you’re not using a library anymore. You’ve accidentally written destiny.

Libraries are not whole. They are shards of code, fragments of a greater machine that exists only in the mind of whoever wires them together. On their own, they do nothing. In fact, a library in isolation is like a screwdriver in a vacuum: technically functional, but only if the vacuum contains screws, and also your hand.

Because libraries are meant to be shared, they cannot possibly match everyone’s needs—unless all developers secretly write the exact same software (which, given the state of package registries, is starting to seem disturbingly plausible). Inevitably, there will be pieces you don’t use, or pieces you wish existed but don’t.

So, forget completeness. Judge a library by its versatility. Can you twist it, stretch it, strip it bare, or graft new limbs onto it without it screaming too much? Can it mutate to fit your project’s oddly-shaped hole in reality? If yes, congratulations—you have a good library. If not, it’s probably time to sharpen your design scalpel.

After all, no library is complete… but the good ones are incomplete in the right way.

2 Likes

I think this sentiment often stems from two factors:

  1. Insufficient upfront and long-term planning.
  2. The false premise that users always know best what should be included in the product, which, in turn, is a result of #1.

This creates a never-ending loop where it's hard to tell when the user base is satisfied and whether the product is complete.

In practice, maintainers usually have deeper understanding of the domain context than most users, and many would prefer a stable, complete release (e.g., >= v1.0) that shows how to address their needs out of the box, rather than a not-yet-finished project with frequent breaking changes and shifting goals.

The idea of publishing incomplete software perhaps comes from startup culture, where shipping MVPs as soon as possible is a way to attract early audience. This makes sense from a business perspective but it may not always align with sound engineering.

What this means in practice is that publishing before the first major version should usually be the exception rather than the rule.

Normally, software needs to be designed before development starts. It helps to articulate a clear vision of goals and non-goals, examine use cases, and outline a development plan. Only then start the development. This is a common idea that helps bringing a project to a complete state. During development, many things can change and be rethought in the initial plan, but you will have a clearer vision of where you are and how close the project is to the established goals.

The initial development does not necessarily need to be public. It's an internal process, and once it's finished[1], it can be published from version 1.0 as a final and complete solution.

At that point, the project does not require extensive maintenance: fixing bugs, if any, and adding minor features occasionally.

After a while, once you've gathered and reviewed sufficient feedback, you can begin the next major version in a new development cycle, and so on.

Regarding third-party dependencies, each one carries maintenance costs, so it's often wise to keep them to a minimum. Sometimes reimplementing a feature within the crate can reduce maintenance (and bus factor) and keep the codebase more consistent and specialized.

Overall, this approach is more akin to traditional C/C++ practices, but it has some advantages.


  1. which includes not only coding but also QA (or at least test coverage), good documentation, examples, and other elements that make up a "product" ↩︎

1 Like

You just described the waterfall model, pretty much. And even though there are niches where such front-heavy models work well, they’re just that: niches. Agile has it issues, most pressingly that nobody can agree what agile is, but the idea that most software can be comprehensively designed and specified before writing code is not realistic. Coding is design work; it’s not possible to separate the two. Iterating by giving the users something tangible to test is also design work.

You can do requirements specification for years and in the end it’s little more than a mirage, a castle in the clouds. It’s as if a civil engineer were to design a bridge without doing a geotechnical survey first.

2 Likes

I'd rather consider whether a library is complete at a version N.x (with N >= 1), since many libraries keep evolving: they're adapted to the possibly every-changing technology they're addressing, their development team got enough feedback and insight to improve the library to a more mature version, or they get new features / API changes that deserve a version bump.[1]

I'm not sure there's any universal and objective set of criteria to tell a library is mature, although there are checklists and books that give good guidance. But relying on a few somewhat subjective terms, I'd say:

  • the feature goals initially set for that version are achieved, plus some goals that have been inevitably added or changed during the course of the development—beware of feature creep, though; maybe that's why so many crates are still 0.x.y
  • the library is validated enough and doesn't have significant pending issues (easily said for big crates with a huge number of users)
  • the API is mature enough for production

My experience in the industry regarding dependencies is that we tended not to change too often. You're bound to meet bugs and limitations in almost any library (and tool), so once you've finally worked around them, it's always a risk to update. In my environment anyway, that's not something we did lightly, but maybe it wasn't typical. Of course, if that's for security reason or for a stability issue we hadn't discovered, of course it'd grant an update, but otherwise, we wouldn't update only because a dependency has. However, the pressure in the industry often isn't the same as in "hobby" open-source projects.


  1. The upside is that you can consider your library has been "complete" several times. :wink: ↩︎

1 Like

To clarify, I'm not advocating a fully comprehensive plan. It can be quite broad and not necessarily take years of requirements specification. Even spending 1-2 days to sketch the design and goals on paper notably helps over the following months of development. Also, I'm not suggesting rigidly sticking to that initial plan. In contrast to Waterfall, development can, and perhaps should, be very adaptive. But having a plan at hand shows you how far you are from the initial draft.

I'm not a big fan of excessive enterprise bureaucracy either.

I can't agree with that. At least in my experience, even lightweight manual QA before release can noticeably improve the user experience, especially when the tester is outside of the core development team.

Unfortunately, such a role is uncommon in open-source, but I think it deserves more attention.

I mean, yeah, there's useful libraries that don't really need any real work over the period of decades.

As an example, a direct implementation of a mathematical paper can only really improve the API or performance, and it's pretty plausible you can squeeze out everything worthwhile eventually.

But it's pretty rare in general: most interesting libraries are big enough or solving an ill defined enough problem that they either have new problems or better designs discovered all the time, above and beyond the regular bitrot that's so difficult to avoid.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.