Moving all items out of a Vec without linear iteration


#1

Hi,
I’m trying to consume a Vec in a non-linear fashion. I want to move the items out of the Vec and pass them into a new data structure in an order given by indices in a second Vec.
What I would like to achieve is something like this:

let data = vec!["a".to_string(), "b".to_string(), "c".to_string()];
let indices = vec![2, 1, 0];
let mut reordered = Vec::with_capacity(data.len());
for idx in indices.into_iter() {
    // this will fail because of moving out of indexed content
    reordered.push(data[idx]);
}

edit: Playground Link

I’m aware that why this is not possible, the data vector will be left in an invalid condition during the iteration and there is no way for the compiler to verify that the vector is actually entirely consumed.

I want to avoid sorting the data vector or using .remove() since both would end up shuffling and moving a lot more than theoretically necessary. Is there any (cheap) way to get what I wanted or is sorting really the best choice here?

Note that reordering (and just using) the initial vector would only work for this example as both data structures are vectors. In the actual code reordered would be a different type.

Thanks in advance!


#2

One option that springs to mind is using std::mem::replace, e.g.:

let mut data = vec!["a".to_string(), "b".to_string(), "c".to_string()];
let indices = vec![2, 1, 0];
let mut reordered = Vec::with_capacity(data.len());
for idx in indices {
     reordered.push(std::mem::replace(&mut data[idx as usize], String::new()));
}

If the type doesn’t have a cheap “default” value to swap in, perhaps you could use Option<MyType> inside data, and then use Option::take() to take the value out (it’s essentially the same thing as std::mem::replace).

If you take this approach, you’ll want to be careful to ensure indices doesn’t have duplicate values (maybe use a HashSet?) or else you’ll end up taking a dummy value with std::mem::replace; this is also where an Option wrapper would help since you’d presumably do Option::take().unwrap(), and that will panic if you make this type of mistake.


#3

I like your approach but unfortunately I don’t have control over the type contained by the data vector and there is no public constructor for the type, either. So I don’t think I can use std::mem::replace() here


#4

Have you tried swap_remove?

https://doc.rust-lang.org/std/vec/struct.Vec.html#method.swap_remove

Hmm, though it will be difficult as the indices change while you do this.


A confession: in one of my own projects, I use unsafe for this:

The PermVec type in the arguments provides the invariant that every index appears once, which is required for safety.

You say you need to use another type for output; I recommend you still go to Vec first if you try this approach. A very important property of the above snippet is that it is panic-safe (it cannot possibly panic in the presence of uninitialized data with destructors), which will be a maintenance hazard if you want to change it to write directly into another output type.


#5

I also just noticed that that particular function fails to ensure that capacity is sufficient, yet I marked it “safe”.

unsafe is hard.


Edit: I mistakenly withdrew this post when I noticed the assert_eq on len, however that is checking inv, not out, so the code is still incorrect.

The simple fix is, of course, to add a reserve after the clear.


#6

As you withdrew the post, I assume you’ve found a solution?
I would be thankful if shared it :slight_smile:


#7

This could be an interesting way of performing the reordering inplace:

fn reorder<T>(data: &mut [T], mut indices: Vec<usize>) {
    let data_len = data.len();
    assert_eq!(data_len, indices.len());

    for index in 0..data_len {
        let mut cur_index = index;
        loop {
            let mut new_index = &mut indices[cur_index];
            if index == *new_index { break }
            
            data.swap(cur_index, *new_index);
            cur_index = std::mem::replace(new_index, cur_index);
        }
        
        indices[cur_index] = cur_index;
    }
}

I ported it from stackoverflow, swapping instead of copying. It should work, even if, as you can imagine, it performs quite a lot of swaps.

If you don’t want to do the operation inplace, a solution could be performing a clone() and than using the function. Not the best approach, but not so bad.

EDIT: warning: the function is not checking for unique indices!


#8

Do you have control over the type of data itself? i.e. can you make it a Vec<Option<ForeignType>>? I wasn’t sure if you also ruled out the Option::take approach.

Also, what do you/we know about the type contained inside data? Does it implement Drop? If it doesn’t, you can maybe just use std::ptr::read to take the value out of data, and leave the original intact. Throw a assert!(!std::mem::needs_drop::<ForeignType>()) in there (or a unit test) to make sure that aspect doesn’t change without your awareness.


#9

I did find a workaround based on std::mem::replace. The type I was dealing with was a petgraph::Graph where the node indices should be in a specific ordering. petgraph offers into_nodes_edges() to destructure a Graph into a vector of Nodes and a vector of Edges. These types don’t offer any public constructors.

But petgraph provides mutable references to the items stored in the Nodes and over those items I have control. This way I can get access to the data without moving and shuffling anything. The only cost is constructing a dummy-replacement.

    for idx in indices.into_iter() {
        new_graph.add_node(mem::replace(&mut graph[old_idx], Data::Dummy));
    }

Thanks everyone for the suggestions!