Swapping data in single array using references

I have a simple set of sample code that works great in the Playground for swapping elements between two vector members of two different structures. My problem is that I'm finding it very difficult to modify the code to allow it to swap elements between two elements in the same structure ( i.e. same vector. )

In general, I'm still having a hard time grasping how lifetimes must be used in functions involving structures holding vectors, so if any of you rust experts can help, I would really appreciate some help. ( I'm doing everything I can to resist using unsafe (raw pointers) but this example has me stumped.)

First the sample code that works great to swap between two different structures with vectors.

use std::mem;

#[derive(Debug)]
struct Bar {
    d: u32
}

#[derive(Debug)]
struct Foo {
    data: Vec<Bar>,
}
impl Foo {
    fn get_swap_target(&mut self, idx: usize) -> &mut Bar {
        &mut self.data[idx]
    }
}

fn main() {
    let mut foo1 = Foo{ data: vec![Bar{ d: 10 }, Bar{ d:  20 }], };
    let mut foo2 = Foo{ data: vec![Bar{ d:100 }, Bar{ d: 200} ], };
    
    println!("before: foo1 is {:?}", foo1);
    println!("before: foo2 is {:?}", foo2);

    let swap1 = foo1.get_swap_target(0);
    let swap2 = foo2.get_swap_target(0);

    mem::swap(swap1, swap2);
    
    println!("after: foo1 is {:?}", foo1);
    println!("after: foo2 is {:?}", foo2);
}

Here's it's output:

before: foo1 is Foo { data: [Bar { d: 10 }, Bar { d: 20 }] }
before: foo2 is Foo { data: [Bar { d: 100 }, Bar { d: 200 }] }
after: foo1 is Foo { data: [Bar { d: 100 }, Bar { d: 20 }] }
after: foo2 is Foo { data: [Bar { d: 10 }, Bar { d: 200 }] }

Now, when I modify the code to operate on a single structure and vector, I run into all sorts of problems. Here is the most simple attempt:

use std::mem;

#[derive(Debug)]
struct Bar {
    d: u32
}

#[derive(Debug)]
struct Foo {
    data: Vec<Bar>,
}
impl Foo {
    fn get_swap_target(&mut self, idx: usize) -> &mut Bar {
        &mut self.data[idx]
    }
}

fn main() {
    let mut foo1 = Foo{ data: vec![Bar{ d: 10 }, Bar{ d:  20 }], };

    println!("before: foo1 is {:?}", foo1);

    let swap1 = foo1.get_swap_target(0);
    let swap2 = foo1.get_swap_target(1);

    mem::swap(swap1, swap2);
    
    println!("after: foo1 is {:?}", foo1);
}

It fails to compile with the following error:

error[E0499]: cannot borrow `foo1` as mutable more than once at a time
  --> src/main.rs:24:17
   |
23 |     let swap1 = foo1.get_swap_target(0);
   |                 ---- first mutable borrow occurs here
24 |     let swap2 = foo1.get_swap_target(1);
   |                 ^^^^ second mutable borrow occurs here
25 | 
26 |     mem::swap(swap1, swap2);
   |               ----- first borrow later used here

Seeing this error I used the following code to return the swap targets as a pair using a single function.

use std::mem;

#[derive(Debug)]
struct Bar {
    d: u32
}

#[derive(Debug)]
struct Foo {
    data: Vec<Bar>,
}
impl Foo {
    fn get_swap_target_pair(&mut self, idx1: usize, idx2: usize) -> (&mut Bar, &mut Bar) {
        (&mut self.data[idx1], &mut self.data[idx2])
    }
}

fn main() {
    let mut foo1 = Foo{ data: vec![Bar{ d: 10 }, Bar{ d:  20 }], };

    println!("before: foo1 is {:?}", foo1);

    let (swap1, swap2)  = foo1.get_swap_target_pair(0, 1);

    mem::swap(swap1, swap2);
    
    println!("after: foo1 is {:?}", foo1);
}

It also fails to compile but provides some possible clues about lifetime concerns.

13 |     fn get_swap_target_pair(&mut self, idx1: usize, idx2: usize) -> (&mut Bar, &mut Bar) {
   |                             - let's call the lifetime of this reference `'1`
14 |         (&mut self.data[idx1], &mut self.data[idx2])
   |         ----------------------------^^^^^^^^^-------
   |         |     |                     |
   |         |     |                     second mutable borrow occurs here
   |         |     first mutable borrow occurs here
   |         returning this value requires that `self.data` is borrowed for `'1`

I thought maybe if I changed get_swap_target_pair() to use a single lifetime might help. Here's the new version of that function (rest of code untouched.)

    fn get_swap_target_pair<'a>(&'a mut self, idx1: usize, idx2: usize) -> (&'a mut Bar, &'a mut Bar) {
        (&mut self.data[idx1], &mut self.data[idx2])
    }

Here's the compiler error I get using this change:

   |
13 |     fn get_swap_target_pair<'a>(&'a mut self, idx1: usize, idx2: usize) -> (&'a mut Bar, &'a mut Bar) {
   |                             -- lifetime `'a` defined here
14 |         (&mut self.data[idx1], &mut self.data[idx2])
   |         ----------------------------^^^^^^^^^-------
   |         |     |                     |
   |         |     |                     second mutable borrow occurs here
   |         |     first mutable borrow occurs here
   |         returning this value requires that `self.data` is borrowed for `'a`

My question boils down to what is the simplest (smallest) change to have my code at the top work with a single structure and vector and two different elements?

There's the swap method on Vec.

foo1.data.swap(0, 1);

It's implemented with unsafe pointers to avoid the double mutable borrow I imagine.

2 Likes
impl Foo {
    fn get_swap_target_pair(&mut self, idx1: usize, idx2: usize) -> (&mut Bar, &mut Bar) {
        use std::cmp::Ordering::*;
        match idx1.cmp(&idx2) {
            Less => {
                let (left_part, right_part) = self.data.split_at_mut(idx2);
                (&mut left_part[idx1], &mut right_part[0])
            }
            Equal => panic!("Indices must not be the same!"),
            Greater => {
                let (left_part, right_part) = self.data.split_at_mut(idx1);
                (&mut right_part[0], &mut left_part[idx2])
            }
        }
    }
}
1 Like

Splitting Borrows - The Rustonomicon

2 Likes

Thanks for the reference. So I need to become more savvy in the "world of unsafe rust" to solve this sort of thing. That's sort of what I was thinking but I wanted to make sure.

There are ways to write it in safe code by leveraging other safe methods (split_at_mut, for example) or even pattern matching.

The biggest advantage of using pointers internally is that they can alias, so it doesn't need to do a bunch of checks (like for the i==j case, since it uses memmove internally, not memcpy).

While the Rustonomicon does contain lots of information about unsafe Rust, it also contains information about safe Rust including e.g. more in-depth explanations of ownership and borrowing. E.g. the code example I gave above doesn’t contain any unsafe code (directly), but calls an existing (safe) abstraction, slice::split_at_mut.

A more in-depth understanding of Rust than what e.g. the rust book provides, is indeed necessary to write correct/sound unsafe Rust code; that’s why it’s included there, I guess.

I wouldn’t necessarily call split_at_mut part of the “world of unsafe rust”. But on the path to this “world of unsafe rust”, at some point you would have to learn that, no matter how smart the borrow checker is or ever becomes, the code

impl Foo {
    fn get_swap_target_pair(&mut self, idx1: usize, idx2: usize) -> (&mut Bar, &mut Bar) {
        (&mut self.data[idx1], &mut self.data[idx2])
    }
}

can never compile because it would allow you to get two mutable references to the same thing (if called with the same index twice).

Similarly, even if slice::split_at_mut didn’t exist already or you wouldn’t know about its existence, with enough understanding of unsafe Rust, you’d determine for yourself that it could be implemented soundly.

2 Likes

You absolutely shouldn't be thinking about touching unsafe if you don't yet understand why this problem occurs or how you can solve it safely for yourself. The compiler emits errors for a good reason – allowing what you are trying to do naïvely is not sound. You should instead familiarize yourself with the facilities the standard library provides for encapsulating the underlying unsafety.

1 Like

This advice is a bit unspecific in terms of what “touching” unsafe means. I think it’s a great thing to learn about unsafe Rust since, as I mentioned, studying material such as the Rustonomicon is teaching you more than just “this is how you can disable that annoying borrow checker”. In fact, properly learning about unsafe Rust will properly will make you understand why you should avoid the unsafe keyword whenever possible (and it’s almost always possible to avoid it; and even when it isn’t avoidable, it can be contained; wrapped up nicely in a small module with a safe API using unsafe code internally, and you might want to have it properly reviewed by other people, too…).

Similarly, if “touching” unsafe means testing out small toy-examples using unsafe code to get a more practical feel for what unsafe code can do, and for how little the compiler will help you in the presence of transmutes or how easy it is to get miri yelling at you for something you were almost 100% certain should be okay, because there was something you didn’t understand yet, I think that’s a totally valid learning experience, too. Of course, if you want the full experience of feeling of no safeguards at all, paired with with a bunch of additional footguns in the language and standard library, sub-par error messages from the compiler, and lots of ill-advising documentation and “beginner tutorials” online, try switching to a different language like C or C++.

Learning what UB means and what “unsound” APIs are is a fascinating topic; you can try to turn your toy-examples with unsafe code into a supposed-to-be-sound safe API, and when you’re confident have others point out probably multiple ways in which it can be broken in a code review in this forum. Or you can try to look at other people’s unsafe code and try to find problems in it. There’s also interesting discussions out there about corner cases of what is and isn’t allowed in unsafe Rust that sometimes hit on the limits of what Rust’s “official” semantics are supposed to be.

If “touching” unsafe means feeling over-confident and using it in production immediately, then sure you shouldn’t do that.

2 Likes

Interesting point and good advice, however I'm wondering if there's a limit to the premise. I do not believe that it's true that I need to understand every little corner of the standard library before I would be able to use unsafe rust safely. I do however know that I need to understand better why this particular compiler problem occurs. I still do not understand how mutability affects lifetimes and references for example.

I've been writing C code since 1985, and know that I can do that safely as long as I spend enough time and analyze it well. I believe I could also have solved this problem using "unsafe" rust raw pointers, without ever knowing the existence of the slice::split_at_mut method.

Great advice. Thank you! I agree completely. I need to really dig into this Rustonomicon. (It's the first I've even heard of it!) lol.

Note that while when you’d be using exclusively raw pointers you could think in a C-like manner, as soon as Rust’s reference types are involved, too, there’s a lot of additional ways to run into UB compared to C. E.g. you must never mutate data through (a pointer obtained from) a shared reference (&T), or create aliasing mutable references (two &mut Ts [or a &T and a &mut T] pointing to the same thing); and it’s also easy to invalidate raw pointers if you’re accessing the same object in some other way, too.

But, in any case, the power of Rust is that you can do so much (if not basically everything) without needing to risk compromise on memory safety by using unsafe. The standard library (and additionally also some popular crates) offer safe APIs for so many interesting patterns of working with ownership and memory, and learning about those is even more interesting and probably more relevant than gaining an in-depth knowledge of unsafe Rust. While I highly recommend the Rustonomicon as learning material (the whole thing is not super long either), IMO it’s mostly valuable for a better personal understanding of the language, and shouldn’t (and probably won’t) encourage you to start using unsafe everywhere.

It’s also listed on the Rust homepage under “Learn”

2 Likes

By "touching" I meant "using". More specifically, using it in cases when it's unnecessary. OP was asserting that they would need unsafe for swapping, which is what I was trying to counter here. Unnecessarily resorting to unsafe when there are safe solutions is what I am advising against.

With all due respect, careful with C! I have also been writing lots of C and C++ before switching to Rust. In my opinion, C and C++ are vastly different from both the safe and the unsafe subsets of Rust in terms of upholding contracts. In particular, Rust has a surprising amount of things you absolutely aren't allowed to do even in unsafe code – and if you are used to the C semantics of things, you will be bitten by them unless you know the Rust-specific pitfalls. I see many people coming from C and expecting that unsafe Rust be just C, and being surprised by the fact that it is not.

For example, in Rust, the mere co-existence of two mutable references to the same place is instant UB. There's no such restriction in C, and swapping is exactly the kind of situation where you might run into this issue.

4 Likes

Thanks for expanding. I'm convinced I need to study and master material in that Rustonomicon before doing unsafe rust.

You're probably correct that I'm trying to use too much of my experience with C as a guide for the sorts of things I must keep in mind in unsafe rust. My point is there are parallels and I am fully aware that there are important contracts that must be upheld beyond what a C programmer worries about. My main point that I was trying to make is that there's some point at which doing something in unsafe rust is the better option even when the problem can be solved using other std: library constructs.

The example I have in mind, is building a temporary non-owning vector of mutable references to a complex tree structure held together by it's own set of vector held references for the purpose of making mutations and then tossing that temporary vector. I do not which to use Arc because it brings in ownership overhead that I do not want around.

If an elegant solution exist in std library for that, then great!

Make sure to always benchmark when using unsafe. If there isn’t a measurably significant performance benefit, you should always consider using the safe alternative.

2 Likes

Thanks for the advise. I couldn't agree more. btw/ It so happens that for problem I'm trying to solve I'm not sure that Arc is necessary and I have not yet exhausted my attempts to use safe rust. I've been trying to solve it for quite awhile and I expect to have a toy program to post here before considering an unsafe implementation. I really appreciate this forum and the kind assistance I have received from you and others!

(i.e. Stay tuned! )

1 Like

This sample code is closer to my "real" problem. I'm not really concerned about swapping in the same array. That is a problem, but the bigger one is how do I maintain a set references in a vector that refer to members of another vector that owns the structures I wish to swap.

So below I have the sample code that attempts to accomplish my goal. This program is very similar to the one I showed at the top of this topic. The only difference is that I have created a new Vector of &Bar in the Foo structure named mask. The purpose of mask is to store a list of "swap candidates". This is to be a subset of Bar references into the data vector.

So whereas the program at the top used variables local to main() -- swap1 and swap2 -- to use in the swap call, I wish to have these stored and used later from the Foo::mask vector in which I intend to store multiple "swap" candidates for later use.

Here's the new code:

// ARRAY ELEMENT SWAP EXAMPLE
use std::mem;

#[derive(Debug)]
struct Bar {
    d: u32
}

#[derive(Debug)]
struct Foo<'a> {
    data: Vec<Bar>,
    mask: Vec<&'a Bar>,
}
impl Foo<'_> {
    fn get_swap_target(&mut self, idx: usize) -> &mut Bar {
        &mut self.data[idx]
    }
}

fn main() {
    let mut foo1 = Foo{ data: vec![Bar{ d: 10 }, Bar{ d:  20 }, Bar{ d: 31}, Bar{ d: 47}, Bar{ d:80} ], mask: Vec::new() };
    let mut foo2 = Foo{ data: vec![Bar{ d:100 }, Bar{ d: 200}, Bar{ d: 302}, Bar{ d: 401}, Bar{ d: 500 } ], mask: Vec::new() };

    println!("before: foo1 is {:?}", foo1);
    println!("before: foo2 is {:?}", foo2);

    let swap1 = foo1.get_swap_target(0);
    let swap2 = foo2.get_swap_target(0);
    
    // store swap candidates into mask
    foo1.mask.push(swap1);
    foo2.mask.push(swap2);

    mem::swap(&mut foo1.mask[0], &mut foo2.mask[0]);

    println!("after: foo1 is {:?}", foo1);
    println!("after: foo2 is {:?}", foo2);
}

And here are the compiler errors:

27 |     let swap1 = foo1.get_swap_target(0);
   |                 ---- first mutable borrow occurs here
...
30 |     foo1.mask.push(swap1);
   |     ^^^^^^^^^      ----- first borrow later used here
   |     |
   |     second mutable borrow occurs here

error[E0499]: cannot borrow `foo2.mask` as mutable more than once at a time
  --> src/main.rs:31:5
   |
28 |     let swap2 = foo2.get_swap_target(0);
   |                 ---- first mutable borrow occurs here
...
31 |     foo2.mask.push(swap2);
   |     ^^^^^^^^^      ----- first borrow later used here
   |     |
   |     second mutable borrow occurs here

error[E0499]: cannot borrow `foo1.mask` as mutable more than once at a time
  --> src/main.rs:33:20
   |
27 |     let swap1 = foo1.get_swap_target(0);
   |                 ---- first mutable borrow occurs here
...
33 |     mem::swap(&mut foo1.mask[0], &mut foo2.mask[0]);
   |                    ^^^^^^^^^
   |                    |
   |                    second mutable borrow occurs here
   |                    first borrow later used here

error[E0499]: cannot borrow `foo2.mask` as mutable more than once at a time
  --> src/main.rs:33:39
   |
28 |     let swap2 = foo2.get_swap_target(0);
   |                 ---- first mutable borrow occurs here
...
33 |     mem::swap(&mut foo1.mask[0], &mut foo2.mask[0]);
   |                                       ^^^^^^^^^
   |                                       |
   |                                       second mutable borrow occurs here
   |                                       first borrow later used here

error[E0502]: cannot borrow `foo1` as immutable because it is also borrowed as mutable
  --> src/main.rs:35:37
   |
27 |     let swap1 = foo1.get_swap_target(0);
   |                 ---- mutable borrow occurs here
...
35 |     println!("after: foo1 is {:?}", foo1);
   |                                     ^^^^
   |                                     |
   |                                     immutable borrow occurs here
   |                                     mutable borrow later used here

error[E0502]: cannot borrow `foo2` as immutable because it is also borrowed as mutable
  --> src/main.rs:36:37
   |
28 |     let swap2 = foo2.get_swap_target(0);
   |                 ---- mutable borrow occurs here
...
36 |     println!("after: foo2 is {:?}", foo2);
   |                                     ^^^^
   |                                     |
   |                                     immutable borrow occurs here
   |                                     mutable borrow later used here

A lifetime parameter on a struct like

struct Foo<'a> {
    data: Vec<Bar>,
    mask: Vec<&'a Bar>,
}

in Rust always means that the references of type &'a Bar inside of Foo<'a> references something that’s outside of the Foo struct itself. Having references in one field of a struct pointing to things in another field is known as a “self-referential” struct, which is not supported by the rust compiler.

For this concrete code example, the easiest approach is probably to turn mask into a vector if indices.

struct Foo {
    data: Vec<Bar>,
    mask: Vec<usize>,
}

(playground)

Other options for solving the problem of self-referential structs include

  • splitting up the struct into two parts

    struct Foo1 {
        data: Vec<Bar>,
    }
    
    struct Foo2<'a> {
        mask: Vec<&'a mut Bar>,
    }
    
  • using crates that offer a safe abstraction for self-referential structs such as ouroboros


In this case, the other solutions still don’t solve the potential difficulties of splitting up a mutable borrow &mut Foo into multiple borrows of the relevant vec items, i.e. multiple &mut Bars. A function such as get_swap_target cannot do this because it turn an exclusive/mutable borrow self: &mut Foo into just a single &mut Bar. Judging by the fact that mask is an array, I assume you’re planning on putting multiple references in there (just your example main function doesn't actually do that).


In such a toy example it’s impossible to say what the best approach is. When working with Vecs, using indices to avoid problems with the borrow checker is definitely quite a generally applicable approach. Furthermore, indices are super efficient, there’s almost no overhead compared to using references directly (just a pointer addition and a bounds check).

I should not have implied I had any need for a self referential data structure. In fact my desire is not at all that. I thought I could simplify the problem that way. Your advice explaining that self referencing is a problem helps however, and I will keep that in mind for other work I need to do. (My C background thinking is clearly a problem here in that storing a set of pointers is never a problem as long as you do not use them incorrectly! )

The problem I am trying to solve involves trees whose nodes contain vectors of their child elements. I wish to have a "scratch pad" (temporary) vector of references of those nodes. The temporary vector's lifetime is less than that of the trees, and that vector will be used to hold swap candidates for an arbitrary set of nodes between an arbitrary pair of any two trees.

So I'm essentially talking about N vectors of objects and an external "temporary" vector containing an arbitrary collection of references into objects stored among those N vectors. The temporary vector will be used to mutate (swaping objects freely among the N vectors.) Since I'm dealing with an arbitrary number vectors, indices are not sufficient for locating them, that's why I want to use references.

It may help to understand I'm doing Gentetic Progragramming and the trees represent S expressions and genetic operators must swap tree branches among separate trees. During the genetic operations a subset of swap candidates must be generated and stored so that only those nodes will be randomly chosen for swapping. That is why I have the need to collect a temporary and external set of references to swap.

I have a family tree that I've constructed as a toy to represent this problem more directly.

You'll want to generate your actual borrows at the time you're going to do the swapping, as it will lock up your entire tree(s). Here's a quick example of the general idea.

1 Like