Sharing a cache of non-static references across threads?


#1

Suppose there is a type Gen which creates values of type Item:

pub struct Item<'a> { pub link: &'a u8 }

// A generator of `Item`s.
pub struct Gen { root: u8 }

impl Gen {
    pub fn gen(&mut self) -> Item {
        Item { link: &self.root }
    }
}

As can be seen Items created by Gen retain a reference to Gen::root.
Assume that Gen::gen is an expensive operation and we would like to cache Items instead of always creating them again:

use std::collections::HashMap;
use std::collections::hash_map::Entry;

// A cache has a generator and caches `Item`s.
pub struct Cache<'a> {
    gen:   Gen,
    cache: HashMap<String, Item<'a>>
}

impl<'a> Cache<'a> {
    pub fn new() -> Cache<'a> {
        Cache {
            gen:   Gen { root: 42 },
            cache: HashMap::new()
        }
    }

    pub fn gen(&'a mut self) -> &'a Item {
        match self.cache.entry(String::from("some-lookup-key")) {
            Entry::Occupied(x) => x.into_mut(),
            Entry::Vacant(x)   => {
                let i = self.gen.gen();
                x.insert(i)
            }
        }
    }
}

This works as expected:

#[test]
fn test_single() {
    let mut c = Cache::new();
    assert_eq!(42, *c.gen().link);
}

However it seems that Cache can not be used from different threads:

#[test]
fn test_multi() {
    use std::sync::{Arc, Mutex};
    use std::thread;

    let a = Arc::new(Mutex::new(Cache::new()));

    let t = thread::spawn(move || {
        let mut c = a.lock().unwrap();
        assert_eq!(42, *c.gen().link)
    });

    t.join().unwrap()
}

Rustc complains:

error: `a` does not live long enough
  --> src/lib.rs:53:21
   |
53 |         let mut c = a.lock().unwrap();
   |                     ^ does not live long enough
54 |         assert_eq!(42, *c.gen().link)
55 |     });
   |     - borrowed value only lives until here
   |
   = note: borrowed value must be valid for the static lifetime...

error: `c` does not live long enough
  --> src/lib.rs:54:25
   |
54 |         assert_eq!(42, *c.gen().link)
   |                         ^ does not live long enough
55 |     });
   |     - borrowed value only lives until here
   |
   = note: borrowed value must be valid for the static lifetime...

error: aborting due to 2 previous errors

Apparently Cache values need to have 'static lifetimes due to std::thread::spawn.

While I understand the reasoning behind spawn's type signature, I have been unable to write a cache which holds values of non-static lifetimes. This is especially puzzling to me because I can share Gen itself across threads:

#[test]
fn test_gen() {
    use std::sync::{Arc, Mutex};
    use std::thread;

    let a = Arc::new(Mutex::new(Gen { root: 42 }));

    let t = thread::spawn(move || {
        let mut c = a.lock().unwrap();
        assert_eq!(42, *c.gen().link)
    });

    t.join().unwrap()
}

After all this I wonder―how does one write Cache correctly?

Thank you for bearing with me. I look forward to your replies!


#2

The problem is that the threading support in the standard library does not allow reasoning about the thread’s lifetime, even though your program is correct since you do t.join() before destroying the cache.

A solution to this are “scoped threads”, a feature which was once in the standard library, but turned out to be implemented unsafely (ages ago). However, not all hope is lost: Since “the incident”, a few safe implementations have appeared on crates.io, for example, the one in crossbeam - crossbeam::scope. If my understanding of scoped threads isn’t completely bogus, you should be able to spawn your thread with this and move the cache into the closure you pass to crossbeam::scope.


#3

Thank you for your reply. Unfortunately I think that crossbeam's scoped threading does not help me here. If I read the documentation correctly, scoped threads are “guaranteed to terminate before the current stack frame goes away” and I need threads which are independent of the stack.

What I would really like to understand is the semantic difference between sharing Gen across threads vs. sharing Cache?


#4

Your data structure will not work even in a single thread.

It’s using references in a cyclic way, and Rust’s borrow system does not handle this pattern.

Just a tip, this pattern here is never correct: pub fn gen(&'a mut self) -> &'a Item Note that the lifetime parameter 'a is not a parameter of the method itself. So we know that 'a is already being used somewhere else, and using a lifetime parameter in multiple places creates a constraint that the compiler must fulfill: Find a “value” for the lifetime parameter that fits in all those usage locations simultaneously.

So it ends up being a constraint that the borrow of the value &'a mut self required to call the gen method must have the same lifetime as every Item in the Cache. This necessarily means that gen can only be called once in the life of a Cache.


#5

I see. Thank you!

Is the conclusion then that I simply cannot implement a cache which holds items which contain non-static references? Or how else would such a thing be written?


#6

Yes it can be done. Maybe you need some unsafe code to tackle parts that the borrow checker cannot see is safe. If you’re careful and make sure it is safe.

For example, in your example, the hashmap owns Item, so the items don’t have constant addresses and may move in memory when the hashmap grows. A way to mitigate that would be to have the hashmap own Box<Item>; then you know the items themselves stay put in the same memory location, as long as they are not removed from the map.


#7

But Item itself still has a non-static reference inside of it so the cache would be of type HashMap<String, Box<Item<'a>>>, i.e. would this not exhibit the same problem as before?


#8

The cyclic reference thing is simply not possible and another way must be found for that.


#9

Yes, but this does not seem to be possible without changing the definition of Item. This would be easy to achieve in this example but if Gen and Item are from an external crate and can not be modified there does not seem to be a way to implement a cache of Items.