I don't have much Rust experience, however I think I understand what traits are meant to do, but looking at some documentation for std::collections has me left a bit confused.
I'd expect things as HashSet/Map to have an K: Eq + Hash
bound, and BTreeSet/Map to have a K: Ord
bound, yet looking at the documentation, there seems to be no bound on the type and I can indeed instantiate them fine with a type that doesn't implement the bounds.
Once I insert elements however I get an error message about the unsatisfied bounds. Of course, this is expected, however I wonder where those bounds came from in the first place. BTreeSet::insert
has an Ord
bound, so that's clear. However HashSet::insert
seems to have no bounds, so how does the compiler know to raise an error? Are the bounds derived from somewhere else? And if so, why are they not showing up in the documentation? Also, since trait bounds are part of the public API, how can we even guarantee API stability if trait bounds don't have to be written explicitly?
Whenever possible, the bounds go only on the impl
section for the methods that rely on those bounds, and the doc does show those bounds on the impl
section as well:
https://doc.rust-lang.org/std/collections/struct.HashMap.html#impl-HashMap<K,+V,+S>-1
The idea is to defer constraints until when they're really needed. This allows more code reuse, and also avoids redundant declaration of the bounds on both the struct/enum and the impl
section. They always have to go on the impl
section.
However, there are times when you have to redundantly declare bounds on the struct or enum type. For example, when a struct contains a field whose type requires those bounds.
Putting trait bounds on structs is usually pointless, unless the bound participates in the definition of the struct itself (independent of any associated impl
s) in some way. There are exceptions, but they're rare.
The convention, instead, is to put bounds on the functions or impl
s to which they apply. That link goes to the start of an impl
on HashSet
that contains methods whose behaviours all depend on T
implementing both Eq
and Hash
, and, sure enough, the bound you expected is there.
Those methods will be unavailable on HashSet<T>
instantiations where T
doesn't implement both types. Other methods, like new()
, will be available, though admittedly they won't be all that useful.
This is something that always frustrates me about rustdoc with very large interfaces, too. The impl
block in question for HashMap
is halfway down the page. If you don't know that the language allows multiple impl blocks and rustdoc displays them separately (out of necessity), then these can be hard to find.
A simple trick that helps a little bit is clicking the "Summary " link in the upper-right corner of the page. It gives you the condensed view of the interfaces and the impl blocks stand out a bit more. You can also collapse all of the impl blocks to get an even better view. (Is there a key binding to toggle collapse/expand all?)
I would still prefer a Table of Contents which summarizes and links to impl blocks without any other manual intervention. Would be a nice rustdoc feature request... I think, actually, this might be what I want: rustdoc: provide summary views · Issue #14475 · rust-lang/rust
In Rust we typically try to define things as generically as possible, so trait
bounds often don't appear until they have to. It's important to differentiate between what a type is from what a type can do.
Admittedly types like BTreeMap
are effectively useless if the underlying key does not implement Ord
; but technically if you want the ability to construct an empty one, you can regardless of what trait
s the key implements or doesn't implement.
There is some inconsistency on where trait
bounds appear though in libraries including std
. I'd say typically trait
bounds appear within an impl
block; and all functions that require those, and only those, trait
bounds are defined in that block. However there are types where an impl
block will contain fewer trait
bounds and some of the functions within add additional bounds. For example there are two impl
blocks for BTreeMap
with A: Allocator + Clone
even though there could technically just be one.
I personally prefer to reduce how often I need to define trait
bounds, so I'm quite consistent in defining separate impl
blocks for each possible set of trait
bounds this way I don't have to add those trait
bounds separately for each function, but sometimes this can cause a lot of different impl
s.
Things become "trickier" when some trait
bounds have to be deferred to the function. For example, BTreeMap::get
is polymorphic based on the supplied Q
; thus K: Borrow<Q>
has to be added there; however K: Ord
could have been added to the impl
block instead. Like anything, there is going to be some level of inconsistency with things. Presumably the "logic" around not having an impl
block with K: Ord
is that if you already have to define a trait
bound on K
for the function, then might as well define them all there. Personally, I would have made a separate impl
block; but I can be frustratingly consistent sometimes often for no real benefit other than to appease my silly brain.
Wow! I never even realized non-trait impl blocks have their own headers in the docs, I always just navigated by clicking the function name in the sidebar, which jumps over the impl header. The Rust doc structure makes really little sense. Why not just add the impl block's trait bounds individually to each function, that way one can clearly see the interface. Otherwise, you have to scroll back pages and pages to discover the missing trait bounds. Seriously, Vec has 150+ methods, and lots of other stuff in std isn't much better, this is very frustrating.
As for why they're done like this, it's because it's convenient to have
fn show_it<T: Debug>(x: &BTreeSet<T>) {
println!("{:?}", x);
}
compile, without getting an error saying "you didn't write where T: Ord
", because you don't actually care about that when you're just trying to debug-print the tree.
If the bound was on the type, you'd have to write out that bound every time you mention the type generically, even in places where you weren't doing anything that cared.
I encourage you to add your support for the issue I linked. The conversation there has died down, but the problem remains. Remember to link back to this thread for additional context.
Any insight on when to prefer
impl<T: Bound> Foo<T> {
⋮
}
over
impl<T> Foo<T> {
fn bar<T: Bound>() {
⋮
}
}
where all the functions in the former actually require Bound
? It seems this is a case where it's as much an art as a science since adopting one over the other is terrible when taken to the extreme. For example the former is terrible if I have 100 functions each with separate bounds since now I have 100 impl
blocks containing a single function; similarly the latter is terrible when I have 100 functions with the same bound since now I have to specify a trait
bound 100 times.
Edit
I meant
impl<T> Foo<T> {
fn bar()
where
T: Bound
{
⋮
}
}
for the latter.
But if BTreeSet<T>
already had the bound on its type, that bound could also transfer to show_it
. If the set is in the show_it
's parameter list, it's already part of the public interface of the function anyways. Why would the language designers not allow that?
This seems like a rustdoc
thing and not a "language designer" thing. Also how do you propose to solve the problem where there is a separate impl
for a specific type? What if BTreeSet<u32>
had a dedicated impl
? Or are you proposing that certain type constructors are magically "blessed" that such possibilities don't exist (i.e., rustdoc
will have special treatment for "collections" in std
/alloc
)?
The second one should be
impl<T> Foo<T> {
fn bar() where T: Bound {
⋮
}
}
But you can use them interchangeably and together, so just pick the one that's most convenient.
The web solution to this is to make the impl heading sticky. Rustdoc doesn't have any similar UI so I'm not sure what they'd think of that idea.
Because as language designers we want the upstream library to be able to weaken bounds on stuff without breaking downstream code.
For example, if you have struct Foo<T: Copy>(T);
, we want it to be legal to change it to just struct Foo<T: Clone>(T);
instead, as a backwards-compatible change.
But that would potentially break show_it
if show_it
didn't have to also say the bound.
Ah, duh. My bad for being too quick to write it out.
Another example of why a collection without bounds is not useless: Imagine a generic enum
that had, say, a BTreeMap
in some variants but not others. It could still have a useful subset of functionality without imposing an Ord
requirement on the generic (using variants without the BTreeMap
). But if the requirement was on BTreeMap
, that requirement would have to be used on the enum
too.
Incidentally, that concept is called implied bounds. As was already pointed out, a main downside is that removing bounds becomes a breaking change. And another is that it's less flexible for consumers, as with my enum
sketch.
Do you happen to know if going from one to the other is a breaking change? I can see a situation where one approach makes sense, but then eventually the other does. For example, maybe having a single impl
block with no trait
bounds is used; but eventually you find that there are too many functions defined with the same bound so you'd like to move those functions into a separate impl
block with the bounds defined there. The other direction seems less likely, but I'm interested in knowing if it is a breaking change nonetheless as well. Perhaps it's not a breaking change to move some functions in one impl
block to another so long as the entire impl
block is not removed (i.e., once one impl
block exists, it must always exist)?
I'm 99% sure this is not breaking. The impl block itself doesn't show up at all in the API. Doing this to a trait block would be breaking, though.
I still don't get it. If we weaken the requirements from Foo<T: Copy>
to Foo<T: Clone>
, any show_it
function that (hypothetically) implicitly derived its bounds from a Foo<T>
parameter as T: Copy
would also weaken to T: Clone
, which isn't a breaking change for any consumer of show_it
.
For show_it
s internal implementation, the bound is only there to access the Foo<T>
parameter, so it's not a breaking change either. Only if the function relied on the Copy
bound it would break, but then it should have put an explicit bound anyways and thus really should break so we can fix it.
The current system means that in the (probably) majority of cases where this wouldn't be an issue, we now need to go thru all our affected function implementations (e.g. show_it
) and manually weaken all their bounds. Which might not even happen if we don't realize that the Foo
we depend on weakened its bounds, since, as you said, it's a non-breaking change.
This example makes a lot more sense. The enum case is really problematic. I'll look into link you posted it seems like an interesting read.
It is breaking for show_it
itself.
Copy
in particular can also cause borrow checker breakage, because Copy
implies there is no Drop
.