Recursive iteration while caching into HashMap

Hopefully the reply from @quinedot convinces you that writing unsafe code is almost always error prone and takes a lot of experience, testing with Miri, and thorough code review. I second the advice you got from @Ddystopia.

Thanks you for the detailed explanation! Previously I thought it's enough to ensure that the lifetime of the pointed data matches the lifetime of the resulting reference, (no matter which immutable reference was used to cast it), but it makes sense now.

Hm, interesting! This confused me at first, so i went looking for more info. The docs for std::ptr state that:

Would it be correct to say that the code from the last playground example using OccupiedEntry::get would not cause undefined behavior under the current implementation of OccupiedEntry (since we know the pointer references an entry in the HashMap rather than some field in the OccupiedEntry), but is unsound because the definition of OccupiedEntry::get does not guarantee that the pointer remains valid after the OccupiedEntry is dropped? That is, a completely backward-compatible change in the implementation of OccupiedEntry (eg. where it now points to the entry itself instead of to the map, but the types don't change at all) could suddenly cause undefined behavior in the example code.

Or is there another reason why keeping a pointer derived from a reference would cause undefined behavior when used after the lifetime of the reference, but not the object it points to? If so, is there a more up-to-date description of the rules to abide by when working with raw pointers?

I was actually curious myself to see if the get version could be justified. In some scenarios, "emulating reborrowing" is definitely UB -- in particular, when there's a destructor that obtains a &mut _ to memory that aliases with the reference you hold on to.

But in this particular case, OccupiedEntry has no Drop implementation, and it would be a breaking change to add one. Moreover, one can determine[1] that whatever drop glue exists doesn't observe the HashMap,[2] and it would be a breaking change to alter that as well. If you're willing to lean on those breaking changes,[3] it's possible the get version is sound.

Basically the argument goes: we obtained a reference to V which is valid in the general sense (like the ptr docs you cited), the only other thing we do with OccupiedEntry is let it drop, the OccupiedEntry can't observe V when that happens as we just discussed, and the HashMap itself remains borrowed for as long as the reference we obtained is valid once we return it (as per our method signature).

There's still a hole in the argument: "is it sound to violate the borrowing constraints of a function if all else holds"?[4] I think it probably has to be -- there's probably too many borrow-checker workarounds that rely on it being okay and the alternative requires some amount of borrow-checking be semantic -- but I can't back that up with a citation.[5] (If someone else has one I'd love to see it.)

There may also be some tricky things around provenance which I didn't think of.


In this particular case, the correct answer is clearly to just use into_mut.

If that wasn't available, I'd probably try to avoid it;[6] if I ended up feeling I had to go with get or some analogous situation, I would at least write some checks asserting the drop-related properties hold. Upstream making even just the drop glue different feels like too fragile of a foundation for me to want to rely on it for soundness in my own crates. I.e. upstream may not realize some implementation-detail-seeming change is actually breaking.


Incidentally I realized the difference between get and into_mut could be demonstrated more concretely.

OccupiedEntry holds on to the key you fed entry, which opens an easy way to show when the OccupiedEntry is dropped or destructured. See here and change between into_mut and get to see the output change.

into_mut get
occupied: Noisy(4)
dropping 4
cached
join = xyz
occupied: Noisy(4)
cached
dropping 4
join = xyz

The recursive function has created the reference at cached, and it is used after being returned at join = xyz. The key drops at dropping 4.

With into_mut, the key drops in into_mut,[7] before the recursive function holds the reference.

With get, the key drops when the OccupiedEntry drops -- between the reference being created and being used.


  1. assuming OccupiedEntry is sound ↩︎

  2. whatever pointer it holds to V has a trivial destructor ↩︎

  3. "my code was [arguably] sound but now it's not due to a change in HashMap, but that change was breaking so it's their fault not mine" ↩︎

  4. note that "all else" is a very use-case specific qualifier ↩︎

  5. and, after-all, UB is defined at the language level ↩︎

  6. especially since I know from std::HashMap that a better API is possible ↩︎

  7. where the OccupiedEntry also drops/is destructured ↩︎

1 Like

As a follow-up...

I tried to find a citation to back that up too. It's easy to demonstrate, and Drop is magical, so I didn't think it would be too surprising.[1] But instead I found expectations to the opposite and accidental breakage in prominent crates. Trying to figure out exactly what conclusions are safe to make from tests you can write in your own code, especially with an eye to future language developments, leads down a rabbit hole of drop elaboration, const Drop, and in-flux may_dangle issues.

So now I feel even stronger that it's a fragile base to build your own unsafe on top of. In the particular case under discussion I'd probably want at least a check as strong as needs_drop...

        const _: () = {
            assert!(
                ! needs_drop::<OccupiedEntry<'static, i32, OnceCell<String>>>()
            )
        };

Since you can rely on need_drop returning false to mean "dropping is a no-op", that makes the previously stated argument solid up to the "borrowing API" point.


But really I'd just want to not use unsafe and avoid the whole scenario. And one final reminder: even in this hypothetical world where we have get but not into_mut, there is a safe workaround -- just look the value back up.

        match self.0.entry(tree.key) {
            Entry::Vacant(e) => e.insert(OnceCell::new()),
            Entry::Occupied(e) => {
                match e.get().get() {
                    Some(_) => return self.0[&tree.key].get().unwrap(),
                    None => panic!("Loop detected!"),
                }
            }
        };

That even works without Polonius.


  1. Going from trivial drop to any non-trivial drop glue can also be demonstrated by plain old drop-check, no const required. ↩︎

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.