How to avoid interior mutability

In related threads 1 and 2 we had discussed what exactly we're paying the runtime cost of a Rc<RefCell<T>> interior mutability pattern for. (Specifically, why we're paying for dynamic tracking of the RefCell in the presence of Rc.) It was suggested that it is best to avoid the interior mutability pattern as this often leads to code patterns that are basically a 1:1 translation of the patterns used in other procedural or object-based languages. I'm all for that, but would like to understand how to do it.

I've put my application in a playground as a small example. Following the example given by @Hyeonu I used Person as my example. Persons are stored in 2 tables using 2 indices. Sometimes, depending on the user input, the program must maintain a "current person" while simultaneously be able to react to events that could concern the current person or other persons. Events change a person's state, which I simulated using a "mood" field.

This playground shows the program. I have used a random number generator whenever my program must react to external events. In the code, I'm using the Rc<RefCell<T>> interior mutability pattern throughout - what I'm looking for advice on is how to avoid it so that I can directly benefit from Rust's guarantees.

I would probably start something like this:

#[derive(Debug)]
enum Mood {
    HAPPY, NEUTRAL, SAD
}

struct Person {
    name: String,
    mood: Mood,
}

#[derive(Copy, Clone, Debug, PartialEq, Eq, Hash)]
struct PersonId(isize);
#[derive(Copy, Clone, Debug, PartialEq, Eq, Hash)]
struct SecondaryId(isize);

struct People {
    by_id: HashMap<PersonId, Person>,
    by_secondary: HashMap<SecondaryId, PersonId>;
}
impl People {
    fn new() -> Self {
        Self {
            by_id: HashMap::new(),
            by_secondary: HashMap::new(),
        }
    }
    fn insert(&mut self, id: PersonId, p: Person) {
        self.by_id.insert(id, p);
    }
    fn add_secondary(&mut self, id: PersonId, secondary: SecondaryId) {
        assert!(self.by_id.contains_key(&id));
        self.by_secondary.insert(secondary, id);
    }
    fn get(&self, id: PersonId) -> &Person {
        self.by_id.get(&id)
    }
    fn get_mut(&mut self, id: PersonId) -> &mut Person {
        self.by_id.get_mut(&id)
    }
}

fn main() {
    let mut rng = rand::thread_rng();
    let mut rng2 = rand::thread_rng();
    let rndperson = Uniform::from(1..5);
    let rndpersonsid = Uniform::from(1..7);
    let rndprgevent = Uniform::from(1..3);
    let rnduserinput = Uniform::from(0..2);
    let rndmood = Uniform::from(0..3);
    
    let mut people = People::new();
    // Part 1: setup
    let p1 = Person{name: "Alice".into(), mood: Mood::HAPPY};
    let p2 = Person{name: "Bob".into(), mood: Mood::HAPPY};
    let p3 = Person{name: "Carol".into(), mood: Mood::HAPPY};
    let p4 = Person{name: "Dan".into(), mood: Mood::HAPPY};
    people.insert(PersonId(1), p1);
    people.insert(PersonId(2), p2);
    people.insert(PersonId(3), p3);
    people.insert(PersonId(4), p4);
    people.add_secondary(PersonId(1), SecondaryId(1));
    people.add_secondary(PersonId(1), SecondaryId(2));
    people.add_secondary(PersonId(2), SecondaryId(3));
    people.add_secondary(PersonId(3), SecondaryId(4));
    people.add_secondary(PersonId(3), SecondaryId(5));
    people.add_secondary(PersonId(4), SecondaryId(6));

    ...
}

The closures should take a &mut People and ids as argument. To check whether two people are the same, just compare the ids instead of that "is it already borrowed?" hack.

2 Likes

I think your sketch echoes what Hyeonu suggested - pay the price of an additional indirection, mapping SecondaryId to PersonId, which would eliminate the need for an Rc to store two references in 2 tables. Depending on how frequently the secondary lookup occurs, this can be quite a significant cost. (Recall beginners in competitive programming whose programs time out because that hashmap is the only data structure they know and they use it even where an array would suffice.)

You don't yet address the issue of how to keep track of the current person.

I believe that I can solve the problem (trivially) by not using a direct reference for the current person. Instead, I could keep track of the current person using its (integer) id. When I need to access the current person, I would have to look it up using its id every time, which again is a significant cost (in my example, I've omitted the program events that refer to the current person, but they do exist in the actual application).

However, under this programming style, if you think about it more generally, we couldn't use/hold direct references to objects at all, except for ephemeral regions, if those objects need to be stored in a container. This is a style we typically afford in high-level scripting languages but - I would submit - not a style I would expect to be dominant in a low-level language like Rust. It's certainly at odds with chasing the performance edge Rust seems to pursue otherwise, as with avoiding aliasing and the like.

Sure, just keep track of the id. I would not worry about looking up every time. Hashmaps are very fast. Additionally, my version avoids making a new allocation for every single person. (okay there's the name)

For what it's worth, I'm a quite experienced competitive programmer and I have never worried about hashmaps.

If you want something more efficient, you could try to go for a Vec or slab which have even cheaper lookup costs.

I was referring to beginner competitive programmers, not to experienced ones.
These programmers sometimes make the mistake of using HashMap even when a Vec would suffice. Doing so is expensive if the accesses to the map/vector occur in an inner loop. For instance, implementing a typical adjacency list using hash maps instead of vectors about doubles the runtime of the program I had shared here. You can find the code here in this playground to see exactly what I am referring to. (FWIW, as far as out-of-the box hashmaps go, this still seems to be pretty decent, so I'm not criticizing Rust's hashmap implementation here.)

Going back to the topic at hand. It is trivially possible to avoid shared mutable references by using indirection through identifiers, in other words, by not having references at all except during accesses. However, again, imposing such a programming style would - in my view - move Rust out of the family of systems programming languages, it would certainly make me doubt the claim that Rust has found a way to safely manipulate references to objects.

In my view, the strategy of having references everywhere is the world of garbage collected languages. Preventing you from having long-lived references into the map is a very sound decision — hash maps will move things around when you mutate them.

Rc's are safe to move, but I agree that they come from the world of garbage collected languages. How hash maps work is an implementation detail, though, which ideally shouldn't affect the design of the rest of the program.

So is the only alternative to the interior mutability pattern to use consequent indirection in data structures, in other words, avoid references? Let's think through the consequences of this. It would require that all entities by identifiable by a unique id that is efficiently hashable or otherwise has a dense domain. While this applies to Person in my example, it probably doesn't apply in general.

I don't think you are applying interior mutability correctly in this case.

Unless you have requirements you haven't specified, you don't need Rc or RefCell. Cell will suffice and has no runtime cost:
Playground

Thank you.

In the actual application the Person objects are dynamically allocated on the heap. I've stack allocated them just for the example. Persons are created, destroyed etc. Is this compatible with your approach of storing references to Persons as values in the table?

Sure, heap or stack doesn't matter, as long as the objects live as long / longer than the entries in the hashmaps.

Playground

Just keep in mind Cell is !Sync, so this will not work across multiple threads.

That's that the case for my application. So let's see if I understand the approach. Instead of storing a Rc<T>, you're storing an immutable reference &T in the hash table as a value. Because the reference is immutable, all fields of the struct must be immutable as well. You accomplish this via a Cell.

So under this approach, I would forgo the ability to say: p.mood = ; instead, all (logically mutable) fields would need to be of type Cell<Whatever> and accesses would need to use .set(), .take(), .get() etc.? So I would be giving up the syntactic and ergonomic convenience of using assignments, but that presumably wouldn't matter since Cell.set() is a zero-cost abstraction?

ps: at least if the field's type is a cheaply movable value since "assignments" will move in and out.

Cell is zero-cost. You could also collect all mutable fields into a single sub-struct and put that in a cell, or put the entire struct behind a cell.

One thing about this: I mentioned heap/stack doesn't matter, but adding/removing entries requires rebuilding your hashtables because the persons vec is immutably borrowed by them. You might be able to go further with using cell on the persons container but I haven't explored that...

Correct. See playground.

The statement: let current_person = byid.get(&who).unwrap(); immutably borrows the hash table (as it should), so any modifications won't work. However, the actual application adds and removes persons to the table based on user input. So this would be a good solution, in general, only if the set of indexed values is known a priori and static and won't change.

I'd like to follow up on this thread. Thanks to the advice by @m51 I chose to rewrite my original Rc<RefCell<Person>> code as a Rc<Person> and use Cells for all logically mutable fields. This approach lets me:

  • store the smart pointer in multiple hash tables, thus avoiding the awkward intermediate mapping step
  • simultaneously keep a 'currentPerson' as an Rc<Person> reference without having to worry about whether it is currently borrowed
  • has the additional benefit that it adds a touch of "const correctness" for the non-cellified fields
  • presumably has zero-cost for Cell (plus the refcounting cost for Rc)

On the downside,

  • the cellification requires the addition of setter/getter methods and impacts the ergonomics a bit (const fields can be accessed directly, cellified fields only via their getter and setter) - not a major issue IMO
  • it works only for fields that meet the trait bound ?Sized which includes Copy if I read this correctly. Worked fine in my case, though it required deriving Clone,Copy for embedded enums. Learned how fragile matches! is also
  • it works only for fields that don't require mutable borrowing (fine in the Person case, but rules this technique out for another struct in my application that contains hashtables but which unfortunately must be shared.)

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.