Best solution for structuring code where multiple objects need to refer to a "context"

This came out a bit longer that I expected, feel free to ignore it if you don't want to bother lol


I'm somewhat inexperienced with Rust and trying my first bigger project. I'm trying to figure out how to structure a pattern where many parts of the code need to refer to a "context" to perform their operations. Let me try to illustrate my problem with a MWE. Say there's a type UserSet that maps usernames to ids.

struct UserSet;

impl UserSet {
  pub fn add(&mut self, user: String) {}
  pub fn get(&self, user: &str) -> u32 {}
}

Several other structs need a UserSet as a "context" with respect to which to call get to obtain an id to work with. This is not a singleton so different values use different UserSets. So I would have something like

struct Foo<'ctx> {
  user_set: &'ctx UserSet,
  // ...
}

impl Foo<'_> {
  pub fn do_something(&mut self, user: &str) {
    let user_id = self.user_set.get(user);
    // do stuff with user_id
  }
}

i.e. when creating these I need to pass a reference to a UserSet; then the struct's methods can call self.user_set.get(...) whenever they need.

But this doesn't work: now these types lock in a shared reference, so I can no longer call add for any UserSet which is "acquired" by a Foo or a Bar (since add requires a mutable reference to self). So code like

let mut user_set = UserSet;
user_set.add("alice".to_owned());
let foo = Foo { user_set: &user_set, /* ... */ };
user_set.add("bob".to_owned());
foo.do_something("alice");

the last line doesn't compile. However, there should technically be no problem (at least in a single-threaded context), because the shared ref to user_set is only being used while inside the function call (if I'm making myself understood, sorry if my explanation is a bit fuzzy).

Solution 1: Pass &UserSet explicitly

Just express what I just said above in code, and pass the shared ref to functions that need it. Then the compiler can see that the So Foo would be

struct Foo {
  // ...
}

impl Foo {
  pub fn do_something(&mut self, user: &str, user_set: &UserSet) {
    let user_id = user_set.get(user);
    // do stuff with user_id
  }
}

and the code

let mut user_set = UserSet;
user_set.add("alice".to_owned());
let foo = Foo { /* ... */ };
user_set.add("bob".to_owned());
foo.do_something("alice", &user_set);

would happily compile.

Problem: It's less ergonomic I guess, but the worst part is that I have to explicitly pick and pass the correct UserSet for each Foo every time I call do_something. But part of the correctness of Foo is that it is working wrt a UserSet, the same one every time. It becomes very error prone to require the caller to never make a mistake and manually pass the correct UserSet every time, when it's much more natural to pass it only once when Foo is created (and have the reference lifetime ensure it stays alive at least as long as the Foo does).

Solution 2: Use RefCell

Like I said let's forget about Sync and say we're working in a single-threaded context. We can just wrap user_set in a RefCell to get mutable access through a shared reference.

struct Foo<'users> {
  user_set: &'users RefCell<UserSet>,
  // ...
}

impl Foo<'_> {
  pub fn do_something(&mut self, user: &str) {
    let user_id = self.user_set.borrow().get(user);
    // do stuff with user_id
  }
}

and

let mut user_set = RefCell::new(UserSet);
user_set.borrow_mut().add("alice".to_owned());
let foo = Foo { user_set: &user_set, /* ... */ };
user_set.borrow_mut().add("bob".to_owned());
foo.do_something("alice");

Problem: Works I guess, as long as there is only one thread having references to this (enforced by the Foos not being Send), but the performance hit is bad for something that should be statically known to be okay.

Solution 3: Use UnsafeCell

This leaves me UnsafeCell. Basically the interior mutability of Solution 2 but no runtime checks. As long as I promise that the usage of mutable references doesn't overlap with the usage of shared refs (by ensuring those shared refs are only used in the scope of e.g. do_something, and other functions of Foo), I should be fine?

struct Foo<'users> {
  user_set: &'users UnsafeCell<UserSet>,
  // ...
}

impl Foo<'_> {
  pub fn do_something(&mut self, user: &str) {
    let user_id = unsafe{&*self.user_set.get()}.get(user);
    // do stuff with user_id
  }
}
let mut user_set = UnsafeCell::new(UserSet);
unsafe{&mut *user_set.get()}.add("alice".to_owned());
let foo = Foo { user_set: &user_set, /* ... */ };
unsafe{&mut *user_set.get()}.add("bob".to_owned());
foo.do_something("alice");

Problem: It is ugly, and unsafe is scary so I'm not really certain this is even correct.

Question

The question is just how best to structure this sort of pattern in Rust. Any of the 3 solutions? A better one? To be clear: the Foos only need & (shared) access, and only 1 other &mut (exclusive) reference exists, and they don't overlap.

Thanks for reading!

Note that, entirely independent of your questions about accessing and mutating the UserSet, using a borrow here also strongly constraints how your Foos can be used. It requires code using Foo to operate in the pattern of having a user_set to borrow from,

fn example() {
    let user_set = UserSet::new();    
    let foos = vec![Foo::new(&user_set)];
}

and notably you now cannot put UserSet and the Foos in the same struct — it's forced to operate out of a particular stack frame which owns the UserSet (or owns something that owns the UserSet). This is usually not what you want in your application’s main data structures in non-trivial cases.

As a general design principle, only temporary structures should contain borrows.

Instead, if you have to have some shared common resource, you can use Rc or Arc:

struct Foo<'ctx> {
  user_set: Arc<UserSet>,
  // ...
}

However, this offers almost exactly the same immutability as &, so it doesn't solve the rest of your problem.

The lack of other threads is not a sufficient condition for there to be no problem, because a function can call another function while borrowing from user_set and thereby cause a conflict. See The Problem With Single-threaded Shared Mutability for a more general discussion of this.

Something to note about Rust mutability is that any time you have an &mut T reference, you have the ability to swap two Ts. Therefore, having &mut Foo, as is necessary to call Foo::do_something(), also permits swapping the Foo for another Foo that points to a different UserSet. Now the question is: are you okay with that? If not, then you need to avoid offering &mut Foo as part of your module's public API, such as by:

  • Using RefCell for Foo's state, and Rc<RefCell<UserSet>> for the UserSet.
  • Having some struct that collects the UserSet and all the Foos; that struct can be given an API that hands out “reference to Foo and UserSet” structs that are not themselves Foos, which allow mutation without breaking the pairing.

I could say more about what implementations are possible, but I’d need to know more about what operations are available on the Foos and UserSet collectively to design something suitable.

4 Likes

Would you be open to call patterns like

let users: UserSet = todo!();
// […]
users.foo() // constructs a transient `Foo` for `users`
  .do_something("alice");

? I've used this pattern successfully for database interactions in the past, and I've found it makes it fairly straightforwards to follow which store is used for which operations while minimizing the risk of the "wrong" store being used. The constructor foo() would be as brief as

impl UserSet {
  pub fn foo(&self) -> Foo {
    Foo { user_set: self }
  }
}

adjusting the mutability of self as needed.

This also dovetails nicely with @kpreid's comment:

The resulting Foo is short-lived, and is dropped at the end of the statement. The side effects of do_something are persisted into the underlying UserSet (or, in my systems, the underlying Transaction and thence into a database), so creating a new Foo later on will still recover those side effects.


Edit: a worked example, from my actual code, on the themes you're exploring:

        let mut tx = self.db.begin().await?;
        let created = tx.sequence().next(created_at).await?;
        let channel = tx
            .channels()
            .create(name, &created)
            .await
            .duplicate(|| CreateError::DuplicateName(name.clone()))?;
        tx.commit().await?;

Thank you for the replies! You've given me some ideas

Aha, got it. Seems obvious in hindsight x)

Hmm interesting. Yeah I would prefer an API where this misuse is not possible and where the management of the Foo ↔ UserSet relation is "hidden" and doesn't need manual management after the creation of a Foo.

Yeah but Foo is not transient, it also holds state, so I don't think this works for my purposes.

I could decouple everything by having something like

struct InUserSet<'users, T> (&'set UserSet, T);
struct Foo {
  // ...
}

impl<'a> InUserSet<'a, &'a mut Foo> {  // I guess just a roundabout way of taking an explicit UserSet arg
  pub fn do_something(self, user: &str) {
    let user_id = self.0.get(user);
    // do stuff with user_id
  }
}

i.e. decouple the rest of the state in Foo from the reference to UserSet, and just require the caller to provide a reference to the correct UserSet. Maximally flexible I guess but then I'm just back into having to pass the correct UserSet that matches this Foo even tough this correspondence doesn't change during the lifetime of Foo.

To elaborate a bit more, Foo and the other types I have in mind do lots of linear algebra calculations, and the "UserSet" keeps a map of all "users" onto the contiguous range of integers from 0 to n-1 (which are then the indices in the vectors and matrices that Foo calculates and stores).

The space on which this algebra is done is a direct sum of smaller disjoint subspaces, hence having several Foos working on different UserSets rather than just 1 global one.

Is there, or can there be, a type which is a container of of “all Foos and their UserSet” in your application? If so, you can give that type a method that returns InUserSets.

1 Like

Indeed, this is how I'm thinking of structuring it! At the top level there would be a container that owns the UserSets and Foos (there is a static partition of "users" into UserSets, so everything is known at compile time). Then I can get pairs of references (shared or mut) from that container (with some helper functions to ensure the correct pairing between Foo and UserSet) and pass them around as necessary.

Example: my workload is naturally suited to batching, meaning alternating steps where (a) UserSet can change and where (b) the Foos do work on UserSets which are frozen. (I think) with the above approach I can manage this such that in "step b" I can e.g. split work into multiple cores by passing shared references, and only during "step a" need I have a mutable reference to UserSet. All of this with no synchonisation (wouldn't be possible if Foo had to store a reference to UserSet from the time it was constructed).

Thanks for the replies!