Motivating example for `RefCell<T>`


#1

Ok, this request has stumped even @aturon, so let’s see how yinz do :slight_smile:

@steveklabnik and I are trying to come up with an example for the book that motivates RefCell<T> by itself (that is, WITHOUT using Rc<T>). What I’d like most is code that a reader could look at and reason out that the borrowing happening is fine, but that the compiler rejects using regular references, but that compiles and runs without panicking if you use RefCell<T>.

I’ve looked through a lot of the questions in this forum about RefCell and a lot of them are either implemented in a better way without using RefCell<T> (so there’s not a great reason to use RefCell<T>) or they also need Rc<T>.

We definitely do talk about how Rc<T> and RefCell<T> work well together when you want to have shared ownership and also be able to mutate the value, but when introducing RefCell<T>, it’d be nice to have an example that only needs RefCell<T>.


#2

So, I started writing a response, but then I realized – when you say without using Rc, you mean: you do not have a Rc<RefCell<T>>, or that you do not have a RefCell<Rc<_>>? (Because the latter feels very easy.) So we need something where there is aliasing, but it’s not arising because of lots of ref-counted handles to the same value?

If so, that is tricky – I find that the time I really want cell and ref-cell is when I want to have some “context” data structure that is widely referenced, but which still has some mutable state I need to keep updated.

One thought I had might be to use closures – am I allowed to do that?

e.g.

fn callback<F,G>(event_one: F, event_two: G)
where
  F: FnMut(), G: FnMut(),
{ .. }

fn foo() {
   let map = RefCell::new(HashMap::new());
   callback(
     || *map.borrow_mut().entry("a").or_insert(0) += 1,
     || *map.borrow_mut().insert("b").or_insert(0) += 1);
}

As I wrote it, this is pretty artificial, but the idea is that the two closures both need to reference the map (i.e., they share a &RefCell). Imagine that I am trying to count the number of kinds of each callback I get or something. Probably better to have the callbacks pass some data, so that I couldn’t just use two distinct counters.

Anyway, this example probably isn’t it – one challenge is that, without Rc, if you’re going to have the RefCell referenced by many different data structures, they will need to be lifetime parameterized, so that they can share a &'foo RefCell. Closures let us do that without writing explicit types.


#3

I have occasionally used it to work around signatures of traits I need to implement. For example, say I have an Iterator that I want to serialize with serde as a sequence. Since Serialize::serialize takes &self, I need to use a RefCell to be able to mutate my Iterator:

struct IteratorSerializer<I>(RefCell<I>);

impl<I> Serialize for IteratorSerializer<I>
where
    I: Iterator,
    I::Item: Serialize,
{
    fn serialize<S>(&self, ser: S) -> Result<S::Ok, S::Error>
    where
        S: Serializer,
    {
        ser.collect_seq(&mut *self.0.borrow_mut())
    }
}

#4

So I’ve got an example here, but it may be a bit too convoluted for the book. I figure though it may be good to detail regardless and see if y’all can make use of it!

Fundamentally RefCell is intended to take &T -> &mut T in a controlled fashion, so you’ll have to have some sort of sharing to start off. Without Rc<T> the next source of sharing is typically &T itself. An example of this is the Config struct in Cargo. Cargo never actually uses Rc<Config> (although it arguably should) and instead it just uses &'a Config everywhere, This means that you’ve got tons of shared references to a Config.

Once we’ve got all that sharing the desire for mutability comes in a few places. For example there’s a RefCell<Shell> where the Shell is written in “typical” idiomatic Rust that uses &self and &mut self. There’s also a bunch of instances of LazyCell which is basically “this takes awhile to compute and can fail, fill it in and only fail if we hit this code path”.

So I guess basically:

  • Without Rc<T> the desire for lots of sharing comes up with shared references
  • To actually motivate RefCell over a bunch of Cell is “I’m taking idiomatic mutable Rust from elsewhere and using it in a controlled fashion here”

Maybe that helps!

(I think this is very similar to @nikomatsakis’s example)


#5

Think OO - situations where you have a self with a lot of fields, and you want to call various methods without necessarily being able to guarantee unique access to self. One case is with closures:

self.some_method_taking_callback(|item| self.items.push(item));

That wouldn’t work as is, but it could work if items were changed from Vec<Item> to RefCell<Vec<Item>>.

Another case is if you want to create a temporary object with a reference to self, and have uses of it interspersed with direct uses of self.

Edit: Third case: helper methods. Like

fn get_foo_named<'a>(&'a mut self, name: &str) -> &'a mut Foo {
    for foo in &mut self.foos {
        if foo.name == name { return foo; }
    }
    panic!()
}

If you call this from another method, as long as you hold onto the borrowed Foo, you can’t access any other fields of self, even to read them! Arguably the most idiomatic solution would be to separate different kinds of data into separate objects based on access pattern, but that isn’t always possible or ergonomic. A simpler approach is again to use RefCell - change foos from Vec<Foo> to Vec<RefCell<Foo>> and make the signature fn get_foo_named<'a>(&'a self, name: &str) -> &'a RefCell<Foo>.


#6

I always thought the “canonical” example is along the LazyCell lines that @alexcrichton mentioned. That is, a &self method on some type that is read-mostly but occasionally (or once) refreshes the field value.


#7

Not to take away from your example, but wouldn’t it be more straightforward to implement IntoIterator for IteratorSerializer (where it owns the iterator without RefCell) and then use collect_seq on it? That wouldn’t need the RefCell. Just wondering and maybe I’m missing something.


#8

I’m not sure I understand what that would look like. I need some Serialize type that can be passed to e.g. https://docs.rs/serde/1.0.11/serde/ser/trait.SerializeStruct.html#tymethod.serialize_field.


#9

I thought you could call Serializer::collect_seq directly on that iterator wrapper - that requires only that the Item: Serialize, and you wouldn’t need the wrapper itself to be Serialize. But admittedly I’ve not tried it - just seems odd to need RefCell there.


#10

But what is the T: Serialize I pass to serialize_field?


#11

Ah, you’re inside SerializeStruct - sorry. I was talking about Serializer itself - too bad.


#12

The second example on the docs for std::rc uses RefCell<T>.

Edit: Sorry I didn’t the entire post.

WITHOUT using Rc


#13

I’ve used it in a pattern like this:

/// To be used for expensive operations
pub trait Cacheable {
    type Result;
    fn compute(&self) -> Self::Result;
}

pub struct Cache<T: Cacheable> {
    data: T,
    result: RefCell<Option<T::Result>>,
}

impl<T: Cacheable> Cache<T> {
    pub fn get_result(&self) -> &T::Result {
        if self.result.borrow().is_none() {
            *self.result.borrow_mut() = Some(self.data.compute());
        }

        // at this point: invariant upheld: self.result has no mutable borrows and is Some.

        // If you don't want to use unsafe you can have this function return a Ref<T::Result>
        unsafe { &*(&*self.result.borrow().as_ref().expect("cached") as *const T::Result) }
    }
}

As others have pointed out, you can share this Cache without using Rc.

Edit: I suppose this is the same as the LazyCell mentioned before.


#14

The time I used a RefCell is with a struct that holds onto a piece of data, and the client can request the a compressed variant of the data. I don’t want to compress the data unless it is actually needed, and I don’t want to try the compression multiple times.

However, the struct is very much conceptually read-only, and it would be awkward to have to drag &mut references around just to access the cached compressed value.

It does make the return types a bit complicated for my methods that borrow the compressed data, since they have to deal with the Ref... that is returned when you borrow the RefCell.

I think the RefCell makes sense in any situation where performance requires a computation to be delayed (such as the Lazy mentioned above), but not computed multiple times.


#15

One added constraint: ideally, you want an example that isn’t trivially fixed by either non-lexical lifetimes or more sophisticated closure capturing, so that it still applies even when we have those. That also has the advantage of making RefCell feel motivated by “OK, I really don’t see how the compiler would have any hope of doing that”, rather than “that just seems like it makes up for a deficiency in the language”.


#16

This seems like a very good example to me. Something where calling the function should be “constant”, except for mutation of an internal cache.


#17

A suggestion via twitter was using a RefCell to record the state of a mock, I kind of like that too.

Thank you for all the suggestions, everyone!


#18

This seems entirely obvious to me: collections.

Say you want to mutate more than one part of a collection at once. Rust isn’t smart, and it can’t tell that a mutref on a collection’s element should only lock one element and also prevent reshaping the collection. Instead it just locks down the entire collection. This is sad, and sometimes iterators fix this, but sometimes they don’t. So instead of Vec<T> make Vec<RefCell<T>>, and now your collection is free and clear to be edited in more than one spot at once.


#19

Cool. Vec isn’t entirely a good example because you can convert into a mutable slice, but other containers don’t have that property and do require RefCell. Good example!


#20

The ability to get a mutable slice makes no difference.

Say there’s some function, fn doFight(attacker: &mut Creature, defender: &mut Creature) -> CombatLog {. If your creatures are in a Vec you can’t mut reference two creatures at the same time to pass them both into that function. If your creatures are in a mutable slice you still can’t mut reference to two creatures at once. get_mut for slice has every single problem that get_mut for vec has (which makes sense since they’re nearly the same type anyway). You can try some sort of split_at_mut shenanigans but that gets really messy really fast if you need to do more than two targets pulled out of the collection at once.

So, you can do something rash like tell the entire lifetime system to go sit in the corner for a while, or you can use RefCell. Which is still just telling the lifetime system that it’s wrong when you think about it, but using very well tested parts is better than ad hoc parts.