Borrow checker help. Intersect `Vec`

Hello. Thank you for everything the Rust community has done! I've learned a lot and I'm hoping to learn a lot more too!

I'm writing code to intersect sorted Vec of ints. My application is a bit more complicated but I've simplified the example as much as possible and removed any custom structs. I don't need any help with the algorithmic part but I'm trying to wrap my head around how to do this. Ideally without cloning.

If there's an existing function to merge vectors that contain a type that implements Eq and Ord or other algorithmic approaches that's cool but I'd like to learn how to deal with this specific case. If this was at my day job I would ask for a code review but it isn't :(.

Apologies if this isn't the right forum to ask this.

fn main() {}

fn pairwise_intersect(x: &Vec<u32>, y: &Vec<u32>) -> Vec<u32> {
    let mut result = Vec::new();
    
    let mut i = 0;
    let mut j = 0;

    while i < x.len() && j < y.len() {
        let a = x[i];
        let b = y[j];

        if a == b {
            result.push(a);
            
            i += 1;
            j += 1;
        } else if a < b {
            i += 1;
        } else {
            j += 1;
        }
    }

    return result;
}

pub fn intersect(arrs: &Vec<Vec<u32>>) -> Vec<u32> {
    if arrs.len() == 0 {
        return Vec::new();
    }

    if arrs.len() == 1 {
        return arrs[0];
    }

    let mut x: Vec<u32> = arrs[0];

    for i in 1..arrs.len() {
        x = pairwise_intersect(&x, &arrs[i])
    }

    return x;
}

Here's the rust playground link.

With this code I get the two errors

error[E0507]: cannot move out of index of `Vec<Vec<u32>>`
  --> src/main.rs:34:16
   |
34 |         return arrs[0];
   |                ^^^^^^^ move occurs because value has type `Vec<u32>`, which does not implement the `Copy` trait
   |
help: consider cloning the value if the performance cost is acceptable
   |
34 |         return arrs[0].clone();
   |                       ++++++++

error[E0507]: cannot move out of index of `Vec<Vec<u32>>`
  --> src/main.rs:37:27
   |
37 |     let mut x: Vec<u32> = arrs[0];
   |                           ^^^^^^^ move occurs because value has type `Vec<u32>`, which does not implement the `Copy` trait
   |
help: consider borrowing here
   |
37 |     let mut x: Vec<u32> = &arrs[0];
   |                           +
help: consider cloning the value if the performance cost is acceptable
   |
37 |     let mut x: Vec<u32> = arrs[0].clone();
   |                                  ++++++++

Ok I guess that makes sense, the compiler doesn't know when to free the memory in arrs if I return it. The Book doesn't cover this particular error but from the other errors in the ownership section I'm guessing this is the reason for it.

It suggests borrowing, so I change it to a reference. Playground link.

pub fn intersect(arrs: &Vec<Vec<u32>>) -> &Vec<u32> {
    if arrs.len() == 0 {
        return &Vec::new();
    }

    if arrs.len() == 1 {
        return &arrs[0];
    }

    let mut x: &Vec<u32> = &arrs[0];

    for i in 1..arrs.len() {
        x = &pairwise_intersect(&x, &arrs[i])
    }

    return x;
}

Then I get the error

error[E0515]: cannot return reference to temporary value
  --> src/main.rs:30:16
   |
30 |         return &Vec::new();
   |                ^----------
   |                ||
   |                |temporary value created here
   |                returns a reference to data owned by the current function

error[E0515]: cannot return value referencing temporary value
  --> src/main.rs:43:12
   |
40 |         x = &pairwise_intersect(&x, &arrs[i])
   |              -------------------------------- temporary value created here
...
43 |     return x;
   |            ^ returns a value referencing data owned by the current function

For more information about this error, try `rustc --explain E0515`.

Ok I guess it makes sense, I can't return a reference to something on the stack of the current function that is about to exit. So should I put it on the heap here? And use Box ?

I'm a bit stuck. Thank you for any help!

Edit: upon reading the sticky I'm going to move this from Code Review to Help.

First of all, are you sure you want to take &Vec<...>? It's almost always better to use either the owned value or a &[T] instead of it.

Second, this part of code:

...can be rewritten like this:

for item in arrs.into_iter().skip(1) {
    x = pairwise_intersect(&x, item)
}

...which allows you to reborrow the original borrow and not create a fresh one (or to just use the owned value directly).

Third, this code:

...should probably be just using arrs[0].clone(), if you have to pass the borrow, or arrs.pop(), if you're OK with owned value.

1 Like

Thank you for your reply.

First of all, are you sure you want to take &Vec<...> ? It's almost always better to use either the owned value or a &[T] instead of it.

I don't have a strong opinion here. Can you explain why "it's almost always better to use... the owned value" ?

and what is &[T] ?

In short, it's a slice, or, to be more precise, a reference to a slice.

A slice itself is a bunch of values of the same type places next to each other, so that we can, given the pointer to the first element and the length of slice, get any element from it. It can't be used directly, since we don't know its length beforehand, so usually it is behind reference - hence in many cases one can say "slice", meaning the reference to it[1].

Well, the main point is not "the owned value" (that's just a question of whether you need to have a reference in the first place - if you're not sure, you likely don't). The main point is that almost everything[2] that's possible to do with &Vec<T> is also possible with &[T], so there's no need to force this Vec to ever exist.


  1. Other way to use it can be e.g. a "boxed slice", that is, Box<[T]> - essentially a Vec<T> with length unchangeable at runtime ↩︎

  2. The only exception I'm aware of is getting the Vec's capacity ↩︎

If you want to intersect a collection of values, why do you use a Vec at all and not a HashSet or BTreeSet?

Ah thank you, that is elucidating. That type syntax almost looks like Go slices, I should have guessed that.

The main point is that almost everything that's possible to do with &Vec<T> is also possible with &[T] , so there's no need to force this Vec to ever exist.

So your suggestion is to change all of the Vecs in my code to slices? Maybe could you explain a bit more why using a slice would fix the anti-pattern that I'm using that is causing these compiler errors?

that's just a question of whether you need to have a reference in the first place - if you're not sure, you likely don't

In Rust, when do you need a reference?

If you want to intersect a collection of values, why do you use a Vec at all and not a HashSet or BTreeSet ?

Hi Schard thank you for your reply. I'm not interested in different algorithmic approaches and rather I'd like to learn more about the Rust programming model. In my original post I wrote "I'd like to learn how to deal with this specific case" I'll edit it to make it more clear. I'm sure there will be other future cases where there aren't other data structures I can use to avoid borrow checker errors.

In this particular case, my data arrives to me in a sorted Vec already, it's too big to fit all in memory and I can't change that.

I took algorithms and data structures a long time ago. I'm here to learn about Rust today :slight_smile:

This is a good resource exploring slices, and why in general &[T] is better as an argument than &Vec<T>.

But you may want to accept Vec<T> to move the values, if they're not copy (unlike u32 in your snippet, which is copy).


Also, pairwise_intersect in your example could use iterators. Something like this using peek to examine the next reference, and next to advance the iterator consuming the value.

1 Like

This function has to return an owned Vec<u32> instead of a reference, because the latter can only be a borrow of one of the arrs elements, whole, while you're computing the new vector of intersection.
Now, that means that if there is only one vector passed you need to produce an owned version of it. That is done with let mut x: Vec<u32> = arrs[0].clone();

Then, you should consider if you want pairwise_intersect to return a new value each time, or to mutate the existing vector like this:

/// Given a sorted vector `x` and a sorted sequence `y`, keeps only those `x`
/// items which also occur in `y`.
fn retain_intersection(x: &mut Vec<u32>, y: ***) {
    x.retain(todo!());
}

Sequence y only requires the ability of iterating in order so it can be any of &Vec<u32>, &[u32] or impl Iterator<Item = u32>.

Thanks all. I read everything you sent. I ended up using the fold function. It sort of seems like the best way to avoid borrow checker issues in Rust is to use the built-in functions.

pub fn intersect(arrs: Vec<Vec<u32>>) -> Vec<u32> {
    match arrs.len() {
        0 => Vec::new(),
        _ => arrs.iter().fold(TakeAllEnum, |x, y| pairwise_intersect(&x, y)),
    }
}

I added a TakeAllEnum that is the equivalent mathematically of the entire set of possible Vec so basically intersecting any set X with it will return X.

For union you can make the initial value an empty vector since union empty set with X returns X.

In Rust, when do you need a reference?

Also my intuition for using a reference as an argument is when you would like to avoid a move because you would like to permit the caller to re-use the argument without the function returning it back to the caller. Basically you would like to disallow it being dropped once the function exits.

I feel like there was a lot of noise in this conversation, so I'm posting to return to the original question.

The minimal change to make this code work is to clone the vector, just as Rust suggests.

    if arrs.len() == 1 {
        return arrs[0].clone(); // <-- first clone
    }

    let mut x: Vec<u32> = arrs[0].clone(); // <-- second clone

    for i in 1..arrs.len() {
        x = pairwise_intersect(&x, &arrs[i])
    }

The second clone can be avoided by changing the function slightly:

    let mut x: Vec<u32> = pairwise_intersect(&arrs[0], &arrs[1]);

    for i in 2..arrs.len() {
        x = pairwise_intersect(&x, &arrs[i])
    }

If you want to avoid the first clone, you can do that too, but only by changing the function signature. The trick is to use the standard library type Cow (docs) which contains either a value or a reference.

use std::borrow::Cow;

pub fn intersect(arrs: &Vec<Vec<u32>>) -> Cow<'_, [u32]> {
    if arrs.len() == 0 {
        return Cow::Borrowed(&[]);
    }

    if arrs.len() == 1 {
        return Cow::Borrowed(&arrs[0]);
    }

    let mut x: Vec<u32> = arrs[0].clone();

    for i in 1..arrs.len() {
        x = pairwise_intersect(&x, &arrs[i])
    }

    Cow::Owned(x)
}

If you want more background on why to use borrowed types for function arguments rather borrowing the owned type, see the doc Use borrowed types for arguments. It explains the relevant design patterns and idioms and should answer your question

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.