Lifetimes for recursive futures stored in a HashMap

adrianheine · August 24, 2020, 6:57am

I try to write a struct that memoizes a recursive async function in a HashMap. I have a minimal example that works without recursion. Adding recursion, borrowck basically complains that my struct instance doesn't live longer than main(). I don't have any experience with futures yet and complex borrow checking issues have regularly been tough to understand for me, so I'm wondering if what I'm trying to do is even sound in general and how much magic of any kind needs to be involved to convince the compiler.

Yandros · August 24, 2020, 1:04pm

Basically, if we simplify a bit your code, you are constructing a self-referential struct:

struct Cache<'fut> {
    cached: HashMap<Key, BoxedFut<'fut, ...>>,
}

impl<'fut> Cache<'fut> {
    /// Requires a `&'fut self` borrow because it calls `self.work()`
    fn work_impl (self: &'fut Self, ...) -> BoxedFut<'fut, ...>
    { ... }

    
    fn work (self: &'fut Self, ...) -> Ret
    {
//                       a reference to self gets stored inside self
//                       +----------+
//                       |          |
//                       |          | (requires at least a 'fut borrow of self)
//                       v          |
        self.cached.insert(..., self.work_impl(...))
    }
}

That is, a Cache is holding, within a HashMap, a bunch of futures that borrow from self (the outer work_impl future may hold (when recursing) an inner future that will poll / query the Hashmap).

So, while not forbidden, you end up with a very restrictive pattern: the Cache cannot be moved as soon as it borrowed even if just once, since from that borrow may be created a future that may live as long as the Cache itself.

as long as the Cache itself

This is the final hit: sometimes you can get away with a self-referential struct, provided the self-referential struct does not use that self-reference when being dropped (otherwise the self-reference may observe things in an inconsistent state).
And in this case, this is not the case (at least for Rust): dropping the Cache will drop the BoxedFutures stored in its HasMap, which will drop the async fn locals captured inside that future, which could dereference some of these references (I feel like that cannot happen in this very case, but Rust's typesystem is not expressive enough to provide this compile-time unchecked property).

Hence the error:

... and the borrow might be used here, when that temporary is dropped and runs the destructor for type Cached<'_>

Solutions / workarounds

1 - Cache the result of the future, not the future itself:

async fn work(&'_ self, key: Key)
  -> Res
{
    if let Some(ret) = self.cache.borrow_mut().remove(&key) {
        return ret;
    }
    let ret = self.work_impl(key.clone()).await; // await _before_ caching
    self.cache.borrow_mut().insert(key, ret.clone());
    ret
}

Playground

This means that if concurrent (here the futures are not thread-safe, so it would have to be re-entrant concurrency), the future associated with a key may be computed multiple times. But other than that, this solves all the issues.

2 - Use runtime-checked references

If you really know the references / borrows will not be dereferences in the problematic dropped case / or when the cache has been moved out, you can use rc::Weak references as back references to the cache:

/// No more <'lifetime> params!
type CacheEntry = Shared<
    LocalBoxFuture<'static, Res>,
>;

/// No more <'lifetime> params!
struct CachedInner { // Our actual Cache now *needs* an `Rc`, so we bundle that in a newtype
    cache: RefCell<HashMap<Key, CacheEntry>>,
}

/// This is the newtype that bundles the `Rc` inside.
struct Cached(Rc<CachedInner>);

/// We can provide transparent access to the `Inner` fields.
impl ::core::ops::Deref for Cached {
    type Target = CachedInner;
    
    fn deref (self: &'_ Self)
      -> &'_ CachedInner
    {
        &*self.0
    }
}

/// Where we were using `self: &'fut Cache`,
/// we are now gonna be using an `rc::Weak<CachedInner>`.
///
/// For ergonomics, we newtype this one "back-reference" too.
struct CachedRef(rc::Weak<CachedInner>);

impl Cached {
    fn downgrade (self: &'_ Self)
      -> CachedRef
    {
        return CachedRef(Rc::downgrade(&self.0));
        // where:
        impl CachedRef {
            fn upgrade (self: &'_ Self)
              -> Option<Cached>
            {
                self.0.upgrade().map(Cached)
            }
        }
    }
    
    // we can't use `async fn`, otherwise `self` would be captured.
    // we manually hand-roll the `async { ... }` block captures, so as to achieve
    // the `'static` lifetime in return type
    fn work_impl (self: &'_ Self, key: Key)
      -> impl 'static + Future<Output = Res>
    {
        let this = self.downgrade(); // `this` gets captured instead of `self`
        async move {
            println!("work {:?} 1", key);
            let Key(mut res) = key;
            
            // RECURSION 
            if res == "b" {
                let this = this.upgrade().expect("Dangling!");
                res += &this.work(Key("c".into())).await.0;
            }
            
            println!("work {:?} 2", res);
            Res(res)
        }
    }

Playground

adrianheine · August 25, 2020, 9:53am

Thanks for your reply! I guess I'm gonna have to read it a few times in order to fully understand it. Your first suggestion is something I definitely considered, but (my reduced example is a bit misleading) this is really about a) not running work_impl more than once per Key, because work_impl has side-effects and b) knowing when work_impl for a Key has finished (because side-effects). I thought about using an Option<Res> with None marking that there is a future running for that Key, but that would possibly mean excessive polling.

adrianheine · August 27, 2020, 6:28pm

I successfully implemented your second suggestion and it works fine, although it's definitely more complex than I would like. I wonder if my general approach to the problem at hand is not ideal to implement in Rust.

What I'm trying to do is having a diverse set of Resources that can be run by one Setup with the help of a Locator and an Implementor. Everything's loosely coupled.

trait Resource {
  type Location
}

trait Locate<R: Resource> {
  fn locate(&resource: R) -> R::Location;
}

trait Implement<R: Resource> {
  /// This is supposed to run only once per resource
  async fn implement(resource: R) -> Result<(), _>;
}

struct Setup<Locator, Implementor> {}
impl<L, I> Setup<L, I> {
  async fn run<R: Resource<(resource: R) -> Result<R::Location, _>
    where
    L: Locate<R>,
    I: Implement<R>
  {
    // ...
  }
}

In order to guarantee that every Resource is only run once I have an enum wrapping all Resources that's the key to a HashMap in Setup with the futures being the value. Maybe this loose coupling is just something that doesn't fit rust too well. For the time being your solution works, though, so thanks

system · November 25, 2020, 6:28pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
How to get around lifetime issues with checking for entry in HashMap then inserting if not present help	3	114	March 28, 2024
Need help with lifetimes help	8	320	December 23, 2022
Help with lifetimes and references	3	374	May 21, 2021
Async: storing dynamic Future objects help	8	898	May 3, 2021
Lifetime compile issues help	7	584	July 27, 2023

Lifetimes for recursive futures stored in a HashMap

Solutions / workarounds

1 - Cache the result of the future, not the future itself:

2 - Use runtime-checked references

Related Topics