Hopefully the reply from @quinedot convinces you that writing unsafe
code is almost always error prone and takes a lot of experience, testing with Miri, and thorough code review. I second the advice you got from @Ddystopia.
Thanks you for the detailed explanation! Previously I thought it's enough to ensure that the lifetime of the pointed data matches the lifetime of the resulting reference, (no matter which immutable reference was used to cast it), but it makes sense now.
Hm, interesting! This confused me at first, so i went looking for more info. The docs for std::ptr state that:
Would it be correct to say that the code from the last playground example using OccupiedEntry::get
would not cause undefined behavior under the current implementation of OccupiedEntry
(since we know the pointer references an entry in the HashMap
rather than some field in the OccupiedEntry
), but is unsound because the definition of OccupiedEntry::get
does not guarantee that the pointer remains valid after the OccupiedEntry
is dropped? That is, a completely backward-compatible change in the implementation of OccupiedEntry
(eg. where it now points to the entry itself instead of to the map, but the types don't change at all) could suddenly cause undefined behavior in the example code.
Or is there another reason why keeping a pointer derived from a reference would cause undefined behavior when used after the lifetime of the reference, but not the object it points to? If so, is there a more up-to-date description of the rules to abide by when working with raw pointers?
I was actually curious myself to see if the get
version could be justified. In some scenarios, "emulating reborrowing" is definitely UB -- in particular, when there's a destructor that obtains a &mut _
to memory that aliases with the reference you hold on to.
But in this particular case, OccupiedEntry
has no Drop
implementation, and it would be a breaking change to add one. Moreover, one can determine[1] that whatever drop glue exists doesn't observe the HashMap
,[2] and it would be a breaking change to alter that as well. If you're willing to lean on those breaking changes,[3] it's possible the get
version is sound.
Basically the argument goes: we obtained a reference to V
which is valid in the general sense (like the ptr
docs you cited), the only other thing we do with OccupiedEntry
is let it drop, the OccupiedEntry
can't observe V
when that happens as we just discussed, and the HashMap
itself remains borrowed for as long as the reference we obtained is valid once we return it (as per our method signature).
There's still a hole in the argument: "is it sound to violate the borrowing constraints of a function if all else holds"?[4] I think it probably has to be -- there's probably too many borrow-checker workarounds that rely on it being okay and the alternative requires some amount of borrow-checking be semantic -- but I can't back that up with a citation.[5] (If someone else has one I'd love to see it.)
There may also be some tricky things around provenance which I didn't think of.
In this particular case, the correct answer is clearly to just use into_mut
.
If that wasn't available, I'd probably try to avoid it;[6] if I ended up feeling I had to go with get
or some analogous situation, I would at least write some checks asserting the drop-related properties hold. Upstream making even just the drop glue different feels like too fragile of a foundation for me to want to rely on it for soundness in my own crates. I.e. upstream may not realize some implementation-detail-seeming change is actually breaking.
Incidentally I realized the difference between get and into_mut could be demonstrated more concretely.
OccupiedEntry
holds on to the key you fed entry
, which opens an easy way to show when the OccupiedEntry
is dropped or destructured. See here and change between into_mut
and get
to see the output change.
into_mut |
get |
---|---|
occupied: Noisy(4) |
occupied: Noisy(4) |
The recursive function has created the reference at cached
, and it is used after being returned at join = xyz
. The key drops at dropping 4
.
With into_mut
, the key drops in into_mut
,[7] before the recursive function holds the reference.
With get
, the key drops when the OccupiedEntry
drops -- between the reference being created and being used.
assuming
OccupiedEntry
is sound ↩︎whatever pointer it holds to
V
has a trivial destructor ↩︎"my code was [arguably] sound but now it's not due to a change in
HashMap
, but that change was breaking so it's their fault not mine" ↩︎note that "all else" is a very use-case specific qualifier ↩︎
and, after-all, UB is defined at the language level ↩︎
especially since I know from
std::HashMap
that a better API is possible ↩︎where the
OccupiedEntry
also drops/is destructured ↩︎
As a follow-up...
I tried to find a citation to back that up too. It's easy to demonstrate, and Drop
is magical, so I didn't think it would be too surprising.[1] But instead I found expectations to the opposite and accidental breakage in prominent crates. Trying to figure out exactly what conclusions are safe to make from tests you can write in your own code, especially with an eye to future language developments, leads down a rabbit hole of drop elaboration, const Drop
, and in-flux may_dangle
issues.
So now I feel even stronger that it's a fragile base to build your own unsafe
on top of. In the particular case under discussion I'd probably want at least a check as strong as needs_drop
...
const _: () = {
assert!(
! needs_drop::<OccupiedEntry<'static, i32, OnceCell<String>>>()
)
};
Since you can rely on need_drop
returning false
to mean "dropping is a no-op", that makes the previously stated argument solid up to the "borrowing API" point.
But really I'd just want to not use unsafe
and avoid the whole scenario. And one final reminder: even in this hypothetical world where we have get
but not into_mut
, there is a safe workaround -- just look the value back up.
match self.0.entry(tree.key) {
Entry::Vacant(e) => e.insert(OnceCell::new()),
Entry::Occupied(e) => {
match e.get().get() {
Some(_) => return self.0[&tree.key].get().unwrap(),
None => panic!("Loop detected!"),
}
}
};
That even works without Polonius.
Going from trivial drop to any non-trivial drop glue can also be demonstrated by plain old drop-check, no
const
required. ↩︎
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.