I just started taking a look at the experimental Cursor API for BTreeMap.
The function ( insert_after_unchecked ) is marked unsafe, but why?
It shouldn't create memory-unsafety if the key ordering and uniqueness invariants are not upheld, should it? It is possible to break those invariants anyway without unsafe by having a badly-behaved ordering, or mutating keys in cells etc. Shouldn't unsafe be reserved for memory-unsafety?
The rationale is that while code cannot rely on BTreeMap<GenericType> to be properly ordered, there is currently no way to make BTreeMap<TrustedType> (e.g. u32) unordered, so unsafe code might rely on that.
Indeed, this is a possible argument in favor of unsafe marking this API, though IMO unsafe code better shouldn't rely on this unless it's documented somewhere as a future-compatible guarantee that wind ever be broken. Which probably doesn't exist anywhere in the docs, does it?
If the existence of unstable raw entry APIs of HashMap are anything to draw conclusions from, current unsafe code better be assuming that such breakage can and possibly will come, as that API breaks the assumption of HashMap<TrustedKeyType, T, TrustedBuildHasherType> never being in a "broken" state.
Edit: For context, it seems like the unsafe marker here originates from the ACP discussion, from concerns raised initially by @scottmcm mirroring the argument @chrefr formulated above. So it definitely is the intended rationale behind this (though I'm not sure whether or not I personally like it, and I'm not sure if that's just preliminary precaution, or if the guarantee of non-broken BTreeMap for trusted key types is already properly documented and stable guaranteed in such a way that unsafe code today is allowed to rely on it).
I think that would be a strange design choice. Rust is built on memory-safe abstractions where errors that can lead to memory corruption are normally encapsulated in simple, small regions of code. I don't think a client of BTreeMap should need to use unsafe, indeed I generally have
#![forbid(unsafe_code)]
at the top of my crates, which makes checking the crate (for memory safety) easy. I think it would be very unfortunate for that to no longer be the case.
I think it should be up to the authors of unsafe code to make sure the code is memory-safe, rather than putting the burden elsewhere. Of course there are practical choices to be made, but I would say in this case the practical choice is not to declare insert_after_unchecked unsafe.
[ The extreme danger here is putting up unsafe warnings that are not practically needed with the result that unsafe becomes routine and when it really matters it is ignored as noise. This is how most safety systems end up failing. ]
How would that work? Would they have to write their own BTreeMap/HashMap? That doesn't sound ideal to me, especially with all the extra unsafe code needed.
Of course you may argue that this is a really niche usecase, but isn't creating a BTreeMap that's not sorted/with unique keys also really niche (and wrong)?
As long as you still consider it okay to be relying on the correctness of the BTreeMap and HashMap implementations themselves, it should be possible - e.g. in a 3rd party crate - to create a wrapper type around them that doesn't make any API available which can disrupt the internal structure of the map for key types with trusted properly implemented Eq and Ord or Hash implementations.
The use case in question is not about creating BTreeMaps that intentionally break logical correctness invariants. Instead this is about use-cases that mutate keys in ways that don't affect the relative order in the map; in the (common) case that breaking this invariant still has no potential to escalate into memory unsafety, the user would not want to be forced to write unsafe for this kind of code (or introduce interior mutability or other hacky workarounds without need).
Admitted, a small wrapper is probably also easy to do the other way. E.g. you could have a struct MutableKey<K>(pub K); which explicitly documents that its Ord implementation is never considered trustworthy for memory safety concerns, when used as BTreeMap key; and some safe wrappers around the specific APIs such as insert_after_unchecked only for the case of key types of the form MutableKey<K>. Then, only that wrapper would invoke and encapsulate the unsafe APIs of BTreeMap that are only unsafe due to concerns of breaking the tree structure for types with otherwise trusted-to-be-correct Ord implementations.
In Rust, it usually isn't too hard to encapsulate memory unsafety by doing simple things like checking indexes in the implementation of vector types, and making sure things only get dropped once etc. I think the question is perhaps somewhat hypothetical in the sense that pathological BTreeMaps probably do not exist, and neither does code relying on BTreeMaps being well-ordered for memory safety, but the danger of safety warnings being unheeded because they become routine is very real.
If the usage of the map is encapsulated, then the user can just not use the potentially order breaking API. The potential problem only comes when an API takes a map as input or exposes mut access to one and assumes it to be correct.
I concur that such reliance is ill advised when upstream doesn't explicitly document any such properties, only implicitly imply them[1].
Does std have/use a TrustedOrd specialization trait? If so, causing disordering issues for a TrustedOrd key type could actually cause soundness issues.
Saying that an incorrect Ord implementation can cause disordering does casually imply that a correct Ord prevents such, but this isn't a reversible implication. ↩︎
One of the principles I see mentioned in discussions about the design of the language and unsafe code semantics is that Rust should not just be possible to write unsafe code in, but kind to the unsafe code author — to choose rules which make it easier for the author to get things right, by not being surprising and twisty.
I think that it makes sense for BTreeMap (and HashMap) to guarantee that all maps will behave as expected, provided that the key type correctly implements Ord (or Hash) and no unsafe code has interfered, not because it is strictly necessary but because it is kind.
It also means that even safe code can be assured that its correctness will not be affected by corrupted maps. This aids debuggability — making it not possible for the problem to be “this map has unfindable keys” unless unsafe code did something bad or the key type is broken. Again, this is not a necessary property, but it is a kind one.
I seriously doubt BTreeMap invariants make it any easier to author unsafe code ( I could be wrong, if there is a counter-example that would be interesting ), but the proliferation of unsafe code is a serious and ever-present danger, in my view.
Counterpoint: the collections don't structurally require Ord (Hash), so it can equally make sense to provide "raw" APIs which allow using the collection without those bounds by providing the necessary functionality when it's needed. With such APIs available, the collection fundamentally can't guarantee consistency.
But to counter my own counterpoint, hashbrown now provides a HashTable type which is explicitly for the "dependency injecting" use case, which allows it to be a bit nicer to use than the [raw entry API])HashMap in hashbrown - Rust) on maps.
Generally I've been of the opinion that using a single collection type with optional potentially inconsistent low level access is preferable for Eq/Ord/Hash-dependent collections which have to be tolerant of said inconsistency already anyway. But that may be shifting towards preferring providing a "table" type which the map and set APIs are implemented on top of.
It does seem mildly unfortunate that the options for a collection cursor's insert to be using unsafe or eating an additional O(log n) consistency check, though. Absolutely do the check when debug assertions are used[1], but if it's only for correctness, it's exactly the kind of overhead you like to be able to disable in distribution builds (if the potential extent of misbehavior is "simple").
The power of unsafe is in marking actual soundness relevant reasoning more evident, and limiting the extent of unsafe reasoning. Using unsafe for "someone might want to rely on this" without concrete soundness implications dilutes that.
insert is already O(log n) so the consistency check isn't an overhead on asymptotic growth ↩︎
The logical consequence of deciding that a breach of ordinary invariants is to be marked as unsafe would be that any method call which could potentially cause invariants for a struct to be violated should be marked as unsafe, on the grounds that some unknown unsafe code, somewhere, might depend on it to be sound. I don't think that is the right way to go, and it will ultimately lead to unreliable programs with memory safety problems, as nobody will ever think twice about using unsafe. Rust should be about minimising the use of unsafe to keep software memory safe, not the reverse. Correctness is a different concern altogether.
Invariants that can be violated through public API would be an oxymoron. This would violate the definition of what an "invariant" is. The choice is between not providing such API or not specifying that something is an invariant.
What do you mean? The guarantee is all over the place in the docs:
BTreeMap by itself cannot enforce the invariant that elements are stored in sorted order by key, due to interior mutation of keys. It is a "garbage in, garbage out" situation where the client has to co-operate with BTreeMap to keep the map in sorted order. So BTreeMap is making a promise that it (by itself) cannot keep. Normally you would expect the client to co-operate and keep the map in sorted order though, or the map probably would not be useful.
The invariant of BTreeMap is a conditional statement: ifOrd<K> satisfies linear order properties then the keys are always kept ordered. It does always maintain that property. An unconditional insert_after_unchecked would make it violate that property.