How to share state


#1

I have a Tokio-based server where the incoming requests need shared access to some mutable state. A HashMap with a String Key and a std::time::Duration value seems like an appropriate representation.

So far I’ve tried creating the HashMap in main() and then passing a reference to it through the chain of futures at the core of the request processing. This needed explicit lifetimes, and I couldn’t figure out how to specify them, so that didn’t compile. I’ve seen references to “scoped” threads and the various reference counted wrapper types, but I’m unsure how to proceed with those. Another option is a global variable, but that sounds like the wrong approach.

Is there any advice on how to share state in a Tokio (or generally, threaded) project? I can show code once I get a better idea of the way forward.


#2

Rc or Arc are the normal ways to share state if you can’t have everything borrow it. If you want to mutate the state, you’ll probably want to use Rc<RefCell<T>> or Arc<Mutex<T>>.


#3

Tokio’s Core::run shouldn’t require a 'static lifetime on any reference it borrows from its environment because run doesn’t return until the event loop exits. The event loop is single threaded, so you shouldn’t need Mutex (or the like) if the data is only accessed within the loop. If you’re willing to move the mutable borrow in and out of the futures in the chain, you might not even need a RefCell.

It’s possible some of the above is incorrect, so someone more familiar with Tokio can correct it. But what @sfackler said is the general advice regarding sharing mutable state across threads.


#4

Maybe first, I need to get clarify the nature of the sharing. Apologies in advance for this long post.

In this particular program, there is single tokio::reactor::Core and a single call to its run() method. The future passed to run() is a for_each() on a UDP stream. The for_each() closure has the chain of futures, which is indeed the only site where the data is accessed. That chain’s final future is spawned onto the event loop for every request, but I think means “put the work on the existing thread for this event loop”, and not “create a new thread for this work”

Given all this, maybe this isn’t sharing between threads, because there’s only one. So maybe the locking construct isn’t needed. I was looking at RwLock but perhaps that’s irrelevant to this problem. What I need to do is:

  1. Create a HashMap whose lifetime encompasses running the event loop.

  2. Make a mutable reference to the HashMap available inside the for_each() closure.

  3. Somehow avoid inadvertent capturing by any of the closures so that the value can continue to be moved.

The way I pass data through the closures currently was adding them to the data that was passed to the for_each() The way this is done is by implementing the UdpCodec trait:

impl UdpCodec for MyCodec {
    type In = (SocketAddr, MyTask);
    type Out = (SocketAddr, Vec<u8>);
.....

MyTask is a bag of information needed for the request. Just to keep changes simple, I added a HashMap reference to the In associated type (this defines the argument passed to the for_each() closure):

impl UdpCodec for MyCodec {
    type In = (SocketAddr, MyTask, &HashMap<String, u64>);
    type Out = (SocketAddr, Vec<u8>);

No dice:

    type In = (SocketAddr, KnockdTask, &HashMap<String, u64>);
                                       ^ expected lifetime parameter

Then I tried to add lifetimes:

impl<'a> UdpCodec for MyCodec {
    type In = (SocketAddr, MyTask, &'a HashMap<String, u64>);
    type Out = (SocketAddr, Vec<u8>);

Also no dice:

error[E0207]: the lifetime parameter `'a` is not constrained by the impl trait, self type, or predicates
  --> src/main.rs:41:clock6: 
 impl<'a> UdpCodec for KnockCodec {
        ^^ unconstrained lifetime parameter

This doesn’t seem right. Maybe types in there cannot be references. So I tried an Rc:

impl UdpCodec for MyCodec {
    type In = (SocketAddr, MyTask, Rc<HashMap<String, u64>>);
    type Out = (SocketAddr, Vec<u8>);

and also, changed the final expression in the function that returns this data:

    Ok((*addr, task, self.addrs.clone()))

This compiles and the Rc is available in the closure! But, actually trying to use it:

        .and_then(move |args| {
            let (o, t, mut state) = args;
            state.borrow_mut().insert(String::from("sample-key"), 5);

results in:

             state.borrow_mut().insert(String::from("hi"), 5);
             ^^^^^^^^^^^^^^^^^^ cannot borrow as mutable

I think this is closer to the right way to do it. But I’m not sure how to get the mutability back. Any ideas?


#5

You’d need Rc<RefCell<HashMap<...>>> but your snippet only has Rc<HashMap<...>>.


#6

Hm. OK. I’ll have to read up on these. Those wrapper types I haven’t quite grasped, but I think I know to look for articles on “interior mutability” to get that nailed down. For now I’ll just try to get it to compile. I made the following changes:

The structure that holds the map now has this field:

addrs: Rc<RefCell<HashMap<String,u64>>>,

The trait implementation is:

impl UdpCodec for MyCodec {
    type In = (SocketAddr, MyTask, Rc<RefCell<HashMap<String, u64>>>);
    type Out = (SocketAddr, Vec<u8>);

The “return expression” is the following Notice that I dropped clone. Not sure it’s needed.

Ok((*addr, task, self.addrs))

And when the MyCodec struct instance is created, the addrs field is this:

addrs: Rc::new(RefCell::new(HashMap::new()))

Given that, the code in the closure is:

.and_then(move |args| {
                let (o, t, state) = args;
                state.borrow_mut().insert(String::from("sample-key"), 5);

This give the error:

error: no method named `insert` found for type `&mut std::rc::Rc<std::cell::RefCell<std::collections::HashMap<std::string::String, u64>>>` in the current scope
   --> src/main.rs:111:36
    |
111 |                 state.borrow_mut().insert(String::from("sample-key"), 5);

I’m not sure how to dig into the nested structure here and get at that mutable HashMap reference.


#7

Yeah, you need that. Rc is a smart pointer, but its meant to be shared by value (which is what clone will give you). As such, it impl<T> Deref<T> for Rc<T> (note that it’s not implemented for &'some_lifetime Rc<T>.


#8

Yeah, they’re pretty straightforward once you learn about them. Feel free to ask for clarification though.

In a nutshell, Rc and Arc are about shared immutable ownership. They allow having multiple owners of a value, but don’t really provide mutable access to the value (there are some methods to get a mutable borrow, but they come with restrictions). The only difference between these two is the former is single threaded only whereas the latter allows ownership across threads.

Given they don’t facilitate mutable access, RefCell comes into play. Note that RefCell is for singlethreaded cases, so Rc<RefCell<...>> but not Arc<RefCell<...>>. But the “trick” with RefCell is that it gives you mutable access (borrow) without needing to own it or have a mutable reference to the cell itself. To be sound, it enforces rustc’s compile time borrow rules at runtime, dynamically. If you violate the rules, you’ll get a panic (unless using one of the try* methods there).

Hope that helps.


#9

It does help. I’ll study up on the specifics in a bit, but here’s the syntax that works:

(*hm).borrow_mut().insert(String::from("sample-key"), 5);

hm is the Rc<RefCell<HashMap<String,u64>>>. To “deconstruct”,

  1. hm gets dereferenced, (the parens makes sure it’s the HashMap we’re dereferencing, not the result of the whole expression). This yields the RefCell
  2. borrow_mut gets a mutable reference to the HashMap
  3. From there, a value is inserted with a key

#10

I’m not very familiar with Tokio, so sorry in advance if I get anything wrong.

That being said, it’s true that Core::run() doesn’t require the future to be static:

This function will begin executing the event loop and will finish once the provided future is resolved. Note that the future argument here crucially does not require the 'static nor Send bounds. As a result the future will be “pinned” to not only this thread but also this stack frame.

Meaning that you don’t have to use Rc or Arc (although you can if you want to, but it adds additional overhead). However, due to the complicated execution flow, you’ll still need RefCell, the single-threaded version of Mutex / RwLock.

About your unconstrained lifetime error, I believe you have to use the lifetime inside the Codec too (see RFC 447):

struct MyCodec<'a> {
    // stuff
    p: PhantomData<&'a u8>
}

impl<'a> UdpCodec for MyCodec<'a> {
    type In = (SocketAddr, MyTask, &'a RefCell<HashMap<String, u64>>);  // works!
    type Out = (SocketAddr, Vec<u8>);
}

and then

let (o, t, state) = args;
state.borrow_mut().insert(String::from("sample-key"), 5);

Docs about what borrow_mut() does.


#11

You don’t need the parens and the explicit deref operator (i.e. “*”). As mentioned, Rc implements Deref and the dot operator will perform an autoderef to find the appropriate target method. So hm.borrow_mut().insert(String::from("sample-key"), 5); will work.


#12

I’ve not played around with this myself yet, but conceptually a mutable borrow that’s moved in and out of the future chain should work as well, at least conceptually. But yeah, it’s possible that it doesn’t in practice; I’d need to try it.


#13

sccache runs a server using tokio that has this same issue, and we wound up just using Rc<RefCell<T>> for most things (and Arc<T> for a couple of things that we also wanted to be able to pass around to worker threads). I don’t love it, but @alexcrichton wrote that code and I don’t presume to understand tokio better than he does. :slight_smile:

It would be nice if we could figure out if there’s a better pattern to use here that lets you have your shared state just be references to locals in your main function.


#14

I removed the explicit dereference, which cleans things up (and works).

I then tried removing the Rc wrapper for the RefCell. The strategy here was to store a RefCel in the MyCodec structure and then return a reference where I’d returned a a clone of the Rc. So:

Struct:

pub struct MyCodec<'a> {
     //stuff
     addrs: RefCell<HashMap<String,u64>>,
     p: std::marker::PhantomData<&'a u8>
}

Trait:

impl<'a> UdpCodec for MyCodec<'a> {
    type In = (SocketAddr, MyTask, &'a RefCell<HashMap<String, u64>>);
    type Out = (SocketAddr, Vec<u8>);

Expression that lends a reference (this is the stuff that the closures are passed):

    Ok((*addr, task, &self.addrs))

Instatiation of MyCodec:

let mc = MyCodec {
           // stuff
            addrs: RefCell::new(HashMap::new()),
             p: std::marker::PhantomData};
 }

Compile results:

error[E0495]: cannot infer an appropriate lifetime for borrow expression due to conflicting requirements
  --> src/main.rs:65:26
   |
65 |         Ok((*addr, task, &self.addrs))
   |                          ^^^^^^^^^^^
   |
note: first, the lifetime cannot outlive the anonymous lifetime #1 defined on the body at 47:81...
  --> src/main.rs:47:82
   |
47 |       fn decode(& mut self, addr: &SocketAddr, buf: &[u8]) -> io::Result<Self::In> {
   |  __________________________________________________________________________________^
48 | |         let task =
49 | |         if let Ok(recv_req)  = MyReq::from_msg(buf) {
50 | |             if recv_req.addr.is_unspecified() {
...  |
65 | |         Ok((*addr, task, &self.addrs))
66 | |     }
   | |_____^
note: ...so that reference does not outlive borrowed content
  --> src/main.rs:65:26
   |
65 |         Ok((*addr, task, &self.addrs))
   |                          ^^^^^^^^^^^
note: but, the lifetime must be valid for the lifetime 'a as defined on the body at 47:81...
  --> src/main.rs:47:82
   |
47 |       fn decode(& mut self, addr: &SocketAddr, buf: &[u8]) -> io::Result<Self::In> {
   |  __________________________________________________________________________________^
48 | |         let task =
49 | |         if let Ok(recv_req)  = MyReq::from_msg(buf) {
50 | |             if recv_req.addr.is_unspecified() {
...  |
65 | |         Ok((*addr, task, &self.addrs))
66 | |     }
   | |_____^
note: ...so that types are compatible (expected tokio_core::net::UdpCodec, found tokio_core::net::UdpCodec)
  --> src/main.rs:47:82
   |
47 |       fn decode(& mut self, addr: &SocketAddr, buf: &[u8]) -> io::Result<Self::In> {
   |  __________________________________________________________________________________^
48 | |         let task =
49 | |         if let Ok(recv_req)  = MyReq::from_msg(buf) {
50 | |             if recv_req.addr.is_unspecified() {
...  |
65 | |         Ok((*addr, task, &self.addrs))
66 | |     }
   | |_____^

Lifetime woes. It seems that the passed in HashMap only lives to the end of the decode function, which isn’t long enough.


#15

That expresses something different - it makes the codec own the RefCell<HashMap>> and of course you can’t lend it out for a lifetime longer than the codec.

I thought you wanted the cell and the map to live on main()'s stack? In that case, I’d imagine your codec to be:

struct Codec<'a> {
   addrs: &'a RefCell<HashMap<...>>
}

You’d create the codec with a borrow of the cell that’s on main()'s stack before you start the event loop.

That may not work for other reasons, haven’t tried it, but that would be the gist.


#16

I modified the version above where I tried to use a RefCell (not wrapped in an Rc), I changed it to look like your suggestion

 pub struct Codec<'a> {
     // stuff
     addrs: &'a RefCell<HashMap<String,u64>>,
}

The codec with the addrs field initialized like this:

addrs: &RefCell::new(HashMap::new())

Got:

addrs: &RefCell::new(HashMap::new())};
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ does not live long enough

I’m trying to borrow something in main() that was created before the event loop is started. Seems ok, but apparently not. One thing I can’t quite reason about is the effect of the statement that actually uses the codec. This is it (it’s in main()):

let (_, incoming) = sock.framed(codec).split();

codec is moved into the scope of the framed function. I don’t know where codec goes from there (the function does return immediately, obviously, because split() is called on it. What happens to the borrow of the RefCell that ended up in the addrs field?

An observation: the version where the Codec owned a Rc<RefCell<HashMap<..>> , did work: no lifetime complaints, correct operation at runtime. My understanding of that is that using an Rc means the borrow-checker rules happen at runtime, and the reference count allows the system to determine that there are still live references. Is my understanding correct there?


#17

This is trying to take a reference to a temporary and fortunately the compiler prevents you from doing that :slight_smile:.

You want essentially something like this:

fn main() { let addrs = RefCel<HashMap<...>>; let codec = Codec {addrs: &addrs}; Core::run(...); } [quote=“cmusser, post:16, topic:11463”]
My understanding of that is that using an Rc means the borrow-checker rules happen at runtime, and the reference count allows the system to determine that there are still live references. Is my understanding correct there?
[/quote]

There are no runtime borrow checks with Rc; the runtime borrow checks are with RefCell. Rc just keeps the value alive using an internal refcount (which clone() and drop() inc and dec, respectively).


#18

Yeah, earlier I’d tried something similar:

let addrs = RefCell::new(HashMap::new());

Result (same as when I used the temporary):

            addrs: &addrs};
                    ^^^^^ does not live long enough

So I tried variations on yours

Like:

let addrs = RefCell<HashMap<String, u64>>;
let addrs = RefCell::<HashMap<String, u64>>;
let addrs = RefCell::<HashMap::<String, u64>>;

These give syntax errors. I even tried what you said literally (with the … for the HashMap type parameters). That didn’t work either. I’m missing something about the syntax.


#19

I haven’t read this in depth but when I’m being really lazy and don’t feel like reasoning too hard about lifetimes and figuring out stuff and want to test fast and I need a hashmap I just reach for chashmap :laughing:

You don’t need a mutable reference to mutate, the API is almost the same (so you can refactor back to hashmap later if need be) and because it’s a regular reference it can be passed in pretty much any friendly scope, like crossbeam or rayon or regular closures.


#20

Yeah, that’s the proper syntax - my snippet was pseudocode-ish.

What’s the full error message?