Cargo pulling in multiple versions (but inadvertently)

[I apologize for the length of this post; the problem is somewhat involved, and I don't know if I've even described it accurately.]

I'm new to Rust, and working with an existing crate (it's part of a Python package): GitHub - Qiskit/qiskit-terra: Qiskit is an open-source SDK for working with quantum computers at the level of extended quantum circuits, operators, and algorithms.

  1. rustc version: rustc 1.61.0 (fe5b13d68 2022-05-18)

  2. cargo version: cargo 1.61.0 (a028ae4 2022-04-29)

  3. This crate uses a number of existing crates

  4. it ships with a Cargo.lock file, probably generated in the past.

  5. when building with that Cargo.lock file (cargo build), the package "hashbrown" is built with version v0.11.2

  6. but if I blow away that Cargo.lock file, cargo clean and then cargo build, it builds both versions v0.12.1 and v0.11.2. And the build fails with errors related to hashbrown.

Just to be clear: the way the error reproduces is: cargo clean, remove Cargo.lock, then cargo build fails with compile errors related to various types and traits (after building those two versions of hashbrown).

  1. So I can pin the version of hashbrown in the Cargo.toml file to the one I want:
[dependencies.hashbrown]
version = ">= 0.11.2, < 0.12"
features = ["rayon"]
  1. Another reason two versions are getting built, is that indexmap (which in the original Cargo.lock file was 1.8.2) gets pulled in at 1.9.0, and indexmap:1.9.0 depends on hashbrown :0.12. But I can fix that, by pinning indexmap to 1.8.2:
indexmap = ">= 1.8.2, < 1.9.0"
  1. Doing that, cargo clean, remove Cargo.lock, then do cargo build still fails, but this time indexmap is at version 1.8.2 as it was in the original Cargo.lock file. Instead, the problem is now package pyo3, which has stanza
[[package]]
name = "pyo3"
version = "0.16.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1e6302e85060011447471887705bb7838f14aba43fcb06957d823739a496b3dc"
dependencies = [
 "cfg-if",
 "hashbrown 0.12.1",
 "indoc",
 "libc",
 "num-complex",
 "parking_lot",
 "pyo3-build-config",
 "pyo3-ffi",
 "pyo3-macros",
 "unindent",
]

But the package itself (at version 0.16.5) has dependency:

[dependencies.hashbrown]
version = ">= 0.9, < 0.13"
optional = true

[verified by digging out the crate from cargo's cache and unpacking it, just to be sure]

So to my question: why is cargo building two versions of this crate "hashbrown" ?

As I said, I checked in Cargo.lock for all crates that depend on hashbrown, and for all of them that got installed with a newer version (and that pulled in the newer version of hashbrown, I pinned the dependency to an older version (e.g. indexmap). But pyo3 doesn't depend on a newer version of hashbrown, and in fact in the original Cargo.lock says so:

[[package]]
name = "pyo3"
version = "0.16.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1e6302e85060011447471887705bb7838f14aba43fcb06957d823739a496b3dc"
dependencies = [
 "cfg-if",
 "hashbrown",
 "indoc",
 "libc",
 "num-complex",
 "parking_lot",
 "pyo3-build-config",
 "pyo3-ffi",
 "pyo3-macros",
 "unindent",
]

I have no idea what's going on, but basically it seems I cannot delete that Cargo.lock without breaking the build. And that seems completely wrong.

Cargo has a tendency to prefer the latest version whenever possible. If some other crate can use a newer version, it will get it.

Crates with 0.x versions are considered separate and mutually-incompatible, so Cargo will be happy to have 0.11 and 0.12 in the same project. It wouldn't allow 1.11 and 1.12 together, but Cargo globally deduplicates only stable+minor and patch versions. Different major versions and 0.x versions are allowed to be duplicated.

You can try cargo update --package hashbrown --precise 0.11.2

Another option is cargo +nightly -Z minimal-versions update, but this can pull in some very outdated deps.

Your suggestion to use cargo update seems to be to edit the Cargo.lock file. But from my understanding, the lock file isn't meant to be something long-term -- it should always be possible to regenerate the lock file from Cargo.toml, albeit with maybe different versions. Which might cause compilation to fail, sure.

But in this case, I'm specifying in Cargo.toml the version for hashbrown and yet cargo is choosing a different version, even though the constraint on pyo3 admits the specified version.

Isn't this a bug? I didn't open an issue for this, simply because I wanted to verify that it's a bug before doing so.

This is not a bug from Cargo's perspective. Cargo by design prefers picking latest versions, and does not care about minimizing semver-major duplicates.

You are choosing hashbrown version for your crate only, not for the whole project. You don't have direct control over what your dependencies are choosing.

py03 said it can take 0.12, so it gets 0.12. Your crate says it can take only 0.11, so it gets 0.11. In Cargo multiple 0.x versions can coexist, so for Cargo this is perfect, because every crate gets a maximum version it supports.

Cargo.lock is meant to be kept long-term (in source code repository). It is not possible to encode its full state in Cargo.toml.

In this case I think py03 is at fault for specifying imprecise version of hashbrown, but then exposing its own specific version in its public API. Cargo has no way of reliably matching their version via Cargo.toml. py03 should have specified hashbrown = "0.11". Or they could have re-exported hashbrown in their API so that others could use py03::hashbrown to refer to their specific version.

1 Like

Out of interest I tried to reproduce the error but I wasn't able to. Is there a specific branch or commit that contains this issue?

Yes, in the interim, qiskit-terra has fixed its code so it can use the newer indexmap (and hence, the newer hashbrown). The revision where the build failure appears is on main, at 35fd08b6b764775c600e798670e201597aad7a98

1 Like

Let me try to coonvince you to change your mind.

I'll make three arguments:

Vendoring

Cargo by design supports vendoring. In a scenario
where crate A depends on crate B, that depends on crate C, but also
crate B' depends on crate C, and so does A, crates B and B' can each
have their own versions of crate C, which is in turn different from
crate A.

This all works, as long as the interface of crate C doesn't leak out
of crates B, B' into A. When it does, your proposal is that crates B,
B' should export a the version of the C that they use. Will this work
for all cases? And what if B, B' use different versions of C ? Also,
what of macros? [being new to rust, I don't know how they work, but
it seems problematic.]

Also, the authors of crate B, B' did not envisage that some crate A
would use them both, right? Nor that there would be this
version-related build-error. It seems diifficult to imagine those
authors making provision for exporting the version of their dependency
C, for consuming crate A to use. After all, that's what
package-managers are for: so that B can depend on C, and so can A.

Consistency in a build-system

It seems like the default bias for a build-system ought to be towards "less vendoring". All other things
being equal, the version-choice that results in less vendoring should be preferred, and the above is one of the reasons. But generally, vendoring can introduce interesting errors, so if it isn't needed, it
shouldn't happen.

And in this case, it absolutely isn't needed.

Again, I do not mean that cargo shouldn't choose the latest version
when feasible. Cargo should absolutely choose the latest version
whenever feasible. But that should never cause unnecessary vendoring.

And why? Because (I will claim) the default mindset of developers is
not that their crate C will be vendored by consuming crates B, B',
A. vendoring is always a work-around for a problem. Hence, it should
never be the default.

Cargo.lock is not human-writable, and not (really) code-reviewable.

I'm on shakiest ground here, but it seems to me that just reading
Cargo.lock, it's clear that humans aren't meant to read it, write it,
or review it. It contains checksums, after all. Stuff that humans
really cannot and should not be checking. As such, it should never be
the case that it cannot be regenerated from actual sources. It
would be like checking-in yy.tab.c (output of yacc): no typical
programmer ever looks at it, and any situation where someone does, is
by definition problematic.

You described a process whereby a (sequence of?) commands could be
applied to a freshly-generated Cargo.lock (cargo update) to modify
the Cargo.lock file. If some version of these commands could be put
in a stanza of crate A's Cargo.toml, then they could get applied to
Cargo.lock once generated. That would satisfy the rule that all
sources should be human-readable/writable.

This isn't really enough though, b/c what if crate Z depends on crate A ?

So a different proposal, would be that crate A could tighten
dependency constraints on any dependency of a crate on which it
depends. Let me say that again:

crate A should be able to add dependency constraints to crate B's
dependency on crate C.

I'm not a TOML expert, but in YAML, JSON, and other config-languages
this shouldn't be complicated to describe/specify.

And it has the advantage that if a package Z depends on A, B, B', C,
and also maybe on D, which also depends on C, it can enforce
constraints on all these packages to restrict their vendoring. That
might result in "no version-constraint solution found", but this is
better than a solution that causes a build-failure. Of course, once a
solution is found, adding new crates/versions shouldn't invalidate
that solution (though solutions with newer versions might then become
possible).

pyo3 being at fault.

You're right, that pyo3 expressed a bad dependency constraint. But
this pattern (crate/package B expresses too-lax constraint on its
dependent C, allowing versions that don't even exist yet) is very,
very, very common. I don't think it's possible to prevent developers
from doing this. And even if we did, this would result in massive
churn: every time crate C released a new version, every developer of
crates A, B, B' would need to release new versions where they bumped
the upper-bound for their dependency on C, so that users could compile
with this new version C.

And this is why developers typically put too-lax bounds on their
dependencies. It's a fact of life.

I'm still learning, so I found this ( Cargo.toml vs Cargo.lock - The Cargo Book ) which seems to confirm my suspicions (again, I could be wrong):

Cargo.toml and Cargo.lock serve two different purposes. Before we talk about them, here’s a summary:

Cargo.toml is about describing your dependencies in a broad sense, and is written by you.
Cargo.lock contains exact information about your dependencies. It is maintained by Cargo and should not be manually edited.
If you’re building a non-end product, such as a rust library that other rust packages will depend on, put Cargo.lock in your .gitignore. If you’re building an end product, which are executable like command-line tool or an application, or a system library with crate-type of staticlib or cdylib, check Cargo.lock into git.

The implication is that for crates that aren't meant to produce dynamic libraries, the Cargo.lock file isn't meant to be published. And indeed, for none of pyo3, indexmap, hashbrown, num-complex are there Cargo.lock files.

Many, many packages lack Cargo.lock files, and from this I would conclude that the assumption that crate authors operate under, is that when their crates are composed with other crates, cargo will figure out a version-set that makes them all at least compile together.

As I noted previously, the issue here isn't forcing pyo3 to accept a version of a dependency that it normally wouldn't accept: it does happily work with the backlevel dependency, after all. It is not building pyo3 with this newer version, when the price is to vendor that dependency for pyo3.

Note that this is a special case on two counts: these are 0.x versions that work differently in Cargo, and the constraints are using > and <= rather than Cargo's default ^ range.

If these were hashbrown 1.11.0 and hashbrown 1.12.0, then Cargo would not allow them to be duplicated. Cargo allows duplicates only if crates are "incompatible", and considers 0.x incompatible, but 1.y compatible.

Cargo's default behavior is to allow version upgrades only within compatible range (1.0.0 to 1.9999.9999, but not 2.0.0), which by semver rules is not supposed to cause any compatibility risk. Having a dependency with a wider version range that spans multiple incompatible versions is risky. Currently Cargo simply doesn't have features to fully deal with this, so crates shouldn't do this. Ideally crates should bump their own semver-major version when they bump semver-major version of dependencies they expose in their public API. There was a plan to introduce a concept of a "public dependency" in Cargo, so that crates like py03 and indexmap can explicitly state that hashbrown is their public API:

https://rust-lang.github.io/rfcs/1977-public-private-dependencies.html

Hmm, it might be a good idea to document this somewhere. Currently, I have a couple private crates (that I've been meaning to publish) with >=, < version requirements across incompatible versions. This is under the philosophy that I should include the full range of versions that have the features I use; I shouldn't prohibit users from using older versions of dependencies just because I made the crate today instead of last year. Are there any other issues with versioning like this, or just the duplicated-dependency confusion?

A quick scan of my local cargo crate cache shows a large number of "0.." crates. I would bet that most authors are not aware of the cited behaviour of cargo on such crates.

I disagree on the unreviewability here, you cannot check the checksums, but the rest of the lockfile is relatively readable for small changes, especially with the newer format that removes some of the redundant information.