Mutable methods seem pretty limiting. Is there a way to pass a reference that aliases the `self` when safe to do so?

It often happens to me that I need to write a method that modifies some struct but the method needs to be pointed to a particular value contained inside of it with a mutable reference. The borrow checker doesn't let me do that.

I do this all the time in Go, Python, etc.. Now, I know Rust is a different language and I shouldn't necessarily try to translate everything over. I was just curious to know if an idiomatic solution exists.

Yes it is possible to pass the method some information that lets it get a reference to the target value by itself. Maybe passing an index or a key. But that sounds like a workaround honestly.

Here's an example (the simplest I could come up with):

#![allow(unused)]

struct DataStructure {
    vec_a: Vec<i32>,
    vec_b: Vec<i32>,
    // How many times `[DataStructure::clear]` has been called
    n_clear: u32,
}

impl DataStructure {
    fn clear(&mut self, vec: &mut Vec<i32>) {
        vec.clear();
        self.n_clear += 1;
    }
}

fn main() {
    let mut ds = DataStructure {
        vec_a: vec![32, 11],
        vec_b: vec![63, 255, 512],
        
        n_clear: 0,
    };
    
    ds.clear(&mut ds.vec_a);
}

Playground link:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=cc574a791acdcf39623934f4f1c38c1b

I could maybe pass in an enum to tell the method which of the two vectors I want cleared but I have to define a whole new enum just for that!
Also, performance-wise this solution doesn't seem ideal because of the branching that I would have to do inside the method.
And also: this example is fairly simple but imagine the method was defined on a tree and needed to be pointed to a particular node. Yes it would be possible to pass in some kind of information that says: "from the root, go down the first child, then the second child of that, etc..." but isn't this incredibly boilerplatey assuming I already have a reference to the node? Maybe I don't even know the "indications" to reach that node and I would have to compute them just to be able to call the method.

Now yes, this is not allowed because I have aliasing references, I understand that. But is it really unsafe if want to use that second reference just to point the method to a target value?

Is there a way to "transmute" a &mut T or a &T into one of the same type but that falls under the umbrella of borrowing that &mut self has?

In other words: is there a way to take in a &mut self and then craft a &mut T from it, by using some sort of &T given to the method just to point it to a target value?

Here's what I mean (pardon the bad hypothetical syntax):

fn clear(&mut self, vec: &Vec<i32>) {
    let mut vec_crafted = self.[vec];
    vec_crafted.clear();
    self.n_clear += 1;
}

where vec_crafted is borrowed from self and then dropped before adding to n_clear just like:

fn clear(&mut self) {
    let mut vec_a = &mut self.vec_a;
    vec_a.clear();
    self.n_clear += 1;
}

None of the options you mentioned are possible besides the enum. Did you consider defining two clear functions, one for vec_a and one for vec_b, or perhaps doing something entirely different?

I think there are a few different approaches.

  • Using indices is common and often idiomatic, in particular when indexing is possible efficiently (something that might not be the case with a deep tree-structure). So your “enum” solution is line an enum, basically.
  • Using Rc and RefCell everywhere in the right places can give you enough power to do aliasing references with mutation ability.
  • If you’re implementing a very basic primitive, and performance is super important, sometimes working with raw pointers can be worth it. E.g. HashMap uses raw pointers in its Entry API (here the Bucket type contains a raw pointer to the right place in the map). Even performance-sensitive low-level stuff can often be implemented without unsafe Rust though, using indices; especially for a flat structure (like HashMap) that would’ve probably been an option.

The indexing approach you’ve already sufficiently described. Regarding RefCell + Rc approach, something like this works:

#![allow(unused)]
use std::cell::RefCell;
use std::rc::Rc;

struct DataStructure {
    vec_a: Rc<RefCell<Vec<i32>>>,
    vec_b: Rc<RefCell<Vec<i32>>>,
    // How many times `[DataStructure::clear]` has been called
    n_clear: u32,
}

impl DataStructure {
    fn clear(&mut self, vec: &Rc<RefCell<Vec<i32>>>) {
        vec.borrow_mut().clear();
        self.n_clear += 1;
    }
}

fn main() {
    let mut ds = DataStructure {
        vec_a: Rc::new(RefCell::new(vec![32, 11])),
        vec_b: Rc::new(RefCell::new(vec![63, 255, 512])),

        n_clear: 0,
    };

    ds.clear(&Rc::clone(&ds.vec_a));
}

Note that above, the Rc::clone(&ds.vec_a) happens before the call to clear; it doesn’t start borrowing DataStructure mutably until after extracting a clone of the Rc finished.

Yes, there’s lots of syntactic overhead from Rc+RefCell; and when RefCell is misused, it’s easy to run into panics at run-time (beyond that, nothing bad can happen though); but Rc<RefCell<T>> is conceptually quite similar to the ordinary garbage-collected reference in languages like Python; a shared, garbage-collected reference that still offers mutable access.

1 Like

The general solution is "just don't alias references," simply enough. Specifically, while you can't use &mut self and &mut self.field at the same time, you can use &mut self.field1 and &mut self.field2 at the same time.

There is at this time no way to express "&mut self but only for some subset of fields."

For this example, you could write

impl DataStructure {
    fn clear(vec: &mut Vec<i32>, n_clear: &mut u32) {
        vec.clear();
        n_clear += 1;
    }
}

fn main() {
    let mut ds = DataStructure {
        vec_a: vec![32, 11],
        vec_b: vec![63, 255, 512],
        
        n_clear: 0,
    };
    
    DataStructure::clear(&mut ds.vec_a, &mut ds.n_clear);
}

In general, you should try to structure your data in such a way that you can disjoint borrows in this manner. It won't always be possible, but it's possible more often than you'd think.

5 Likes

Don't forget GhostCell, which separates permissions from data and can be used to design things like this with no runtime overhead - although it's not what I would call obvious.

If you are not familiar with this paper, you should read it: it is very good and might help to clarify exactly why it's so difficult to create an abstraction that is both safe and performant. At least, that's what I found.

3 Likes

View structs of a sort is another approach. Playground.

6 Likes

I don't quite like having to manually pass &mut ds.n_clear to every call to clear but passing indices isn't pretty either. I think this can be a good solution in some cases!

I was super excited when I saw GhostCell on /r/rust and watched the introductory video but couldn't quite understand it. The concept seemed very interesting but I thought I did not have a use case for it.

Now that you say that GhostCell can solve this kinds of problems, I will surely take a deeper look at it! Thanks for mentioning it.

Oh wow that article seems to be tackling exactly this very problem with plenty of different solutions. Thanks a lot for linking it!

1 Like

Here's the most straightforward port of your original code to GhostCell that I could manage. Blurred in case you want to try to work it out yourself first...

use ghost_cell::{GhostCell, GhostToken};

type GhostVec<'a, T> = GhostCell<'a, Vec<T>>;

struct DataStructure<'id> {
    vec_a: GhostVec<'id, i32>,
    vec_b: GhostVec<'id, i32>,
    // How many times `[DataStructure::clear]` has been called
    n_clear: GhostCell<'id, u32>,
}

impl<'id> DataStructure<'id> {
    fn clear(&self, token: &mut GhostToken<'id>, vec: &GhostVec<'id, i32>) {
        vec.borrow_mut(token).clear();
        *self.n_clear.borrow_mut(token) += 1;
    }
}

fn main() {
    GhostToken::new(|mut token| {
        let ds = DataStructure {
            vec_a: GhostCell::new(vec![32, 11]),
            vec_b: GhostCell::new(vec![63, 255, 512]),
            
            n_clear: GhostCell::new(0),
        };
        
        ds.clear(&mut token, &ds.vec_a);
    });
}

It may seem kind of excessive to have all three fields of the struct be under GhostCell, but you need to prove that both the vec and n_clear are safe to mutate, and they come from different arguments to clear, so I think that's necessary.

It may be noteworthy that the function signature (in any of the code examples in this thread so far) does not guarantee vec is actually a reference to *self, so n_clear is not a reliable count of the total number of times clear has been called on either vec_a or vec_b. When you recognize this, you might come to the conclusion that there's no particular advantage to having n_clear be part of the same data structure as the vecs, and splitting DataStructure is another way to solve the problem. The simplest way to ensure that clear always clears one of either vec_a or vec_b is simply to use an enum.

1 Like

An option might be something like this. It still requires diligence on the programmer to make sure that if they use clear_op directly that they actually clear something otherwise the count will increment without a clear operation. But that seems unavoidable in some form based on the requirements anyway.


struct Foo {
    a: Vec<()>,
    b: Vec<()>,
    clear: usize
}

impl Foo {
    fn clear_op(&mut self, f: impl FnOnce(&mut Self)) {
        self.clear += 1;
        f(self)
    }
    
    fn clear_a(&mut self) {
        self.clear_op(|foo| foo.a.clear())
    }
    
    fn clear_b(&mut self) {
        self.clear_op(|foo| foo.b.clear())
    }
}
1 Like

Thank you for blurring the code. I read the paper this afternoon and gave GhostCell a spin:

#![allow(unused)]

use ghost_cell::{GhostCell, GhostToken};

struct DataStructure<'id> {
    vec_a: GhostCell<'id, Vec<i32>>,
    vec_b: GhostCell<'id, Vec<i32>>,
    // How many times `[DataStructure::clear]` has been called
    n_clear: GhostCell<'id, u32>,
}

impl<'id> DataStructure<'id> {
    fn clear(&self, vec: &GhostCell<'id, Vec<i32>>, tok: &mut GhostToken<'id>) {
        vec.borrow_mut(tok).clear();
        *self.n_clear.borrow_mut(tok) += 1;
    }
}

fn main() {
    GhostToken::new(|mut token| {
        let mut ds = DataStructure {
            vec_a: GhostCell::new(vec![32, 11]),
            vec_b: GhostCell::new(vec![63, 255, 512]),

            n_clear: GhostCell::new(0),
        };

        ds.clear(&ds.vec_a, &mut token);

        assert_eq!(ds.vec_a.borrow(&token).len(), 0);
        assert_eq!(ds.vec_b.borrow(&token).len(), 3);
        assert_eq!(*ds.n_clear.borrow(&token), 1);
    });
}

If I'm not mistaken, the code is identical to yours.


Actually, I hand't thought of that! That's a great point and yes, now that I know a handful of different solutions to the problem, the enum is probably what I would've gone with.

3 Likes

It always amazes me how the borrow checker points out underlying conceptual problems. Had that multiple times myself.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.