Collecting a vector of references through recursive descent

Just because the compiler seems to haven't broken your code containing undefined behavior yet doesn't mean it can't get new optimizations in the future that would be able to make your code misbehave terribly.

Once you have an immutable reference to something you cannot go back and derive a mutable reference from that. I like this quote from the nomicon

  • Transmuting an & to &mut is UB.
    • Transmuting an & to &mut is always UB.
    • No you can't do it.
    • No you're not special.

(even though it talks about transmuting, any other more indirect &T to &mut T conversion through pointers or even usize follows the same principle)

As mentioned, the only way to do interior mutability in Rust (including unsafe Rust) is with UnsafeCell, or any abstraction built on top of it (which includes things like Cell, RefCell, Mutex, RwLock from the standard library).

4 Likes

Thanks for the expansion. I also like the quote. :wink: ( Lol-- Last thing I want to do is make the mistake of thinking I am I'm special! )

Today I'll try to make use of those abstractions ( stay tuned! )

I really appreciate the help!

I'm trying to wrap my head around how UnsafeCell can come the rescue for my case. Do I need to implement the UnsafeCell Trait for Person? Is that the basic idea?

Following examples based on simple types I was overly optimistic that something like this would work.

    let x1: UnsafeCell<Person> = UnsafeCell::new(*swap_target1);
    let x2: UnsafeCell<Person> = UnsafeCell::new(*swap_target2);
    let p1: &UnsafeCell<Person> =&x1;
    let p2: &UnsafeCell<Person> =&x2;

    unsafe {
        let p1_mut: &mut Person = &mut *p1.get();
        let p2_mut: &mut Person = &mut *p2.get();
        mem::swap(p1_mut, p2_mut);
    }

This fails with the following error:

error[E0507]: cannot move out of `*swap_target1` which is behind a shared reference
   --> src/main.rs:131:50
    |
131 |     let x1: UnsafeCell<Person> = UnsafeCell::new(*swap_target1);
    |                                                  ^^^^^^^^^^^^^ move occurs because `*                 swap_target1` has type `Person<'_>`, which does not implement the `Copy` trait

error[E0507]: cannot move out of `*swap_target2` which is behind a shared reference
   --> src/main.rs:132:50
    |
132 |     let x2: UnsafeCell<Person> = UnsafeCell::new(*swap_target2);
    |                                                  ^^^^^^^^^^^^^ move occurs because `*                 swap_target2` has type `Person<'_>`, which does not implement the `Copy` trait

error: aborting due to 2 previous errors



UnsafeCell is a type, not a trait. You can't implement it for a type.

You got those errors because… well, the compiler tells you exactly why. You can't move out of the referent of a reference. What would the reference refer to, then, if you moved its referent? That's the very kind of memory management error Rust is trying to prevent.

You have to own a value in order to be able to wrap it into an UnsafeCell.

I would recommend getting more familiar with unsafe Rust (say, by reading the Rustonomicon) before trying to do stuff with UnsafeCell directly.

As a personal anecdote / word of caution, I wrote an interior mutability (UnsafeCell) abstraction for a specific use case in one of my crates over a year ago. I thought I had a good grasp of the semantics, and I wrote several tests and checked everything with MIRI; I thought my abstraction was safe. And then, just a couple days ago, I discovered a soundness hole that allows aliased mutability.

I rewriting it now, and I'm coming away even more strongly recommending just using the safe abstractions provided by the standard library, especially when it comes to interior mutability.

3 Likes

I see what you're saying. I would like to wrap the referent and have the reference itself be dropped and have the value passed into x1 and x2.

Since I'm no longer using swap_target1 and swap_target2 they should be out of scope following the move.

Because you’re working with a reference, there’s necessarily an owner somewhere responsible for cleaning up / dropping the referent in the future. When you move the referent, you’ve stolen it from its legitimate owner which will then invoke undefined behavior when it tries to free the vacant value.

1 Like

I see. In this case each owner is a Vector unless it's the root level Person "Node" variable.

Is there a way to use UnsafeCell to wrap a reference instead of the value itself and still allow mutation of the referent?

If you have a mutable reference, then there are methods like Cell::from_mut, but there is no way to do it given only an immutable reference.

1 Like

Yeah, I get it. I really would like to see an elegant solution solving this particular problem using std abstractions. Did you see @steffahn 's work from above? I admire it very much, but it leaves me wondering if there may be improvements to std that are called for.

I spent a lot of time distilling this problem into a simple pattern from my real code, in search of a good solution. This is a pattern I think that has very general usage in case you want to give it a crack!

I'm not understanding your first point:

My Person struct has a Vec<Person> just like you are showing. (There is no need for multiple parents.)

The most recent code is what I'm trying to solve for where the two vectors named fifties1 and fifties2 hold a set of candidate references that I want to use for selecting and swapping between two trees. That's where the lifetime issues involving references enter the problem.

The Person struct is as follows and so each parent owns their children.

struct Person<'a> {
    name: &'a str,
    age: i32,
    children: Vec<Person<'a>>,
}

Oh dear, I don't know why I thought your original code had Vec<&'a Person<'a>>. Oof. I lost track of the starting point by the time I got the two playground examples working and wrote up the text.

1 Like

Let me try that again.

How about storing the children in a Vec<Arc<RwLock<Person>>> and replacing all the &Person references with liberal sprinkles of Arc::clone?

struct Person {
    name: Cow<'static, str>,
    age: i32,
    children: Vec<Arc<RwLock<Person>>>,
}

Now the swap targets don't have to be &mut refs. The borrows only happen at the moment mem::swap is called:

let swap_target1 =
    &fifties1[rng.gen_range(0..fifties1.len())];
let swap_target2 =
    &fifties2[rng.gen_range(0..fifties2.len())];

print!("swap target 1 is: "); swap_target1.show();
print!("swap target 2 is: "); swap_target2.show();

mem::swap(&mut *swap_target1.write(), &mut *swap_target2.write());

No unsafe code.

See it running on the playground.

1 Like

In a situation like this, I prefer to use a copy-on-write approach so that I'm sure I'm not accidentally changing the interior of a datastructure that I don't mean to:

#[derive(Clone,Debug)]
struct Person<'a> {
    name: &'a str,
    age: i32,
    children: Vec<Arc<Person<'a>>>,
}

impl<'a> Person<'a> {
    fn new(name: &'a str, age: i32) -> Person<'a> {
        Person {
            name: name,
            age,
            children: Vec::new(),
        }
    }
    fn add(&mut self, name: &'a str, age: i32) -> &mut Person<'a> {
        self.children.push(
            Arc::new(Self::new(name, age))
        );
        self
    }
    
    fn get(&self, idx:&[usize])->&Person<'a> {
        if idx.len() == 0 { self }
        else { self.children[idx[0]].get(&idx[1..]) }
    }

    fn get_arc(self: &Arc<Self>, idx:&[usize])->Arc<Self> {
        if idx.len() == 0 { self.clone() }
        else { self.children[idx[0]].get_arc(&idx[1..]) }
    }
    
    fn update(&mut self, idx:&[usize])->&mut Self {
        if idx.len() == 0 { self }
        else { Arc::make_mut(&mut self.children[idx[0]]).update(&idx[1..]) }
    }

    fn update_arc<'s>(self: &'s mut Arc<Self>, idx:&[usize])->&'s mut Arc<Self> {
        if idx.len() == 0 { self }
        else {
            (&mut Arc::make_mut(self).children[idx[0]]).update_arc(&idx[1..])
        }
    }
    
    fn show(&self) {
        self.show_w_tab(0);
    }

    fn show_w_tab(&self, tab: usize) {
        println!("{:>1$}, {age} yrs old",
            self.name, tab + self.name.len(), age=self.age);
    }

    fn show_family_tree(&self) {
        self.walk(|i,c| c.show_w_tab(i.len()*4));
    }
    
    fn walk<'s>(&'s self, mut f: impl FnMut(&[usize], &'s Self)) {
        let mut idx = vec![0];
        let mut stack = vec![self];
        
        f(&[], self);
        while stack.len() > 0 {
            let &i = idx.last().unwrap();
            let &c = stack.last().unwrap();
            if c.children.len() > i {
                f(&idx, &c.children[i]);
                idx.push(0);
                stack.push(&c.children[i]);
            } else {
                idx.pop();
                idx.last_mut().map(|i| *i+= 1);
                stack.pop();
            }
        }
    }
    
    fn collect_val(&self, filter: impl Fn(&Self)->bool) -> Vec<&Self> {
        let mut result = vec![];
        self.walk(|_,c| {
            if filter(c) { result.push(c); }
        });
        result
    }
    
    fn collect_idx(&self, filter: impl Fn(&Self)->bool) -> Vec<Vec<usize>> {
        let mut result = vec![];
        self.walk(|i,c| {
            if filter(c) { result.push(Vec::from(i)); }
        });
        result
    }
}

fn main() {
    let mut tree1 = Person::new("Ruth", 120);
    tree1
        .add("Pat", 91)
        .add("John", 89);
        
    tree1.update(&[0])
        .add("Jim", 65)
        .add("Chuck", 65);
        
    tree1.update(&[1])
        .add("Stan", 57)
        .add("Anne", 55);
        
    tree1.update(&[1,0])
        .add("Mary", 20);

    tree1.update(&[1,1])
        .add("Helena", 21)
        .add("Peter", 19);
        
    let mut tree1 = Arc::new(tree1);

    let mut tree2 = Person::new("Murial", 91);
    tree2
        .add("Maya", 55)
        .add("Matt", 59);
        
    tree2.update(&[0])
        .add("Julia", 26)
        .add("Andria", 28);

    tree2.update(&[0,0])
        .add("Tom", 2);
        
    let mut tree2 = Arc::new(tree2);

    let (fifties1, fifties2) =
        (tree1.collect_idx(|p| p.age >= 50 && p.age < 60),
         tree2.collect_idx(|p| p.age >= 50 && p.age < 60));

    println!("tree1 people in their fifties...");
    for p in fifties1.iter() {
        tree1.get(&p).show();
    }

    println!("tree2 people in their fifties...");
    for p in fifties2.iter() {
        tree2.get(&p).show();
    }

    println!("Before Swap");

    tree1.show_family_tree();
    tree2.show_family_tree();

    let mut rng = rand::thread_rng();
    let swap_idx1 = fifties1.choose(&mut rng).unwrap();
    let swap_idx2 = fifties2.choose(&mut rng).unwrap();
    
    let swap_val1 = tree2.get_arc(&swap_idx2);
    let swap_val2 = tree1.get_arc(&swap_idx1);

    print!("swap target 1 is: "); swap_val1.show();
    print!("swap target 2 is: "); swap_val2.show();
    
    *tree1.update_arc(&swap_idx1) = swap_val1;
    *tree2.update_arc(&swap_idx2) = swap_val2;

    println!("After Swap");
    tree1.show_family_tree();
    tree2.show_family_tree();
}

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=1a393c045e49700e7a71ebe8ca35e6c6

1 Like

On second thought, the Arcs aren't really a key part of my solution. Instead, it's using &[usize] to index into the trees, so that you don't ever need to keep deep references. This lets the trees move their internal parts around in memory without the borrow checker complaining.

Updated playground that doesn't use Arc:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=193cb4a74f5e43feddb5197769003cd5

1 Like

I am floored. This is extremely helpful. Thank you very much!

FYI, the “Arc<RwLock<…>> plus lots of Arc::clone” approach is basically equivalent to the Rc<RefCell<…>> approach, I’ve mentioned; though I’ve only presented it in the context of a linked-list example above, and I’ve used an iterative non-recursive approach to traverse the list/tree. (Which can help prevent stack overflows if deeply nested structures are possible.)

The choice between Arc+RwLock vs Rc+RefCell is the choice between supporting or not supporting concurrent access to a Person from multiple threads.

2 Likes

Good to know. Currently-- at least--- this particular part of the program, one thread should be sufficient, but it's nice to know multiple threads are an option.

I need to work to understand your solution. Does your update_arc function do a sequential walk to lookup the reference?

My guess is that each index from the collected set of candidates will require a potential walk through the entire tree. If that's correct, then that is a performance hit that I was trying to avoid by using a reference. So perhaps your previous solution is what I would prefer.