Copy Rc<Vec<T>> into a VecDeque<T>

anon80458984 · June 11, 2019, 12:18pm

We have:

pub fn fast_copy<T: Clone>(data: Rc<Vec<T>>) -> VecDeque<T> {
  let mut ans = VecDeque::new();
  for x in data.as_ref().iter() {
    ans.push_back(x.clone());
  }
  ans

Is there a faster way to do this instead of doing a VecDeque::new() followed by lots of VecDeque::push_back ?

jer · June 11, 2019, 12:28pm

Probably by cloning the vector and then using the From implementation: Rust Playground

leudz · June 11, 2019, 12:32pm

You can use

pub fn fast_copy<T: Clone>(data: Rc<Vec<T>>) -> VecDeque<T> {
    FromIterator::from_iter(data.iter().cloned())
}

or

pub fn fast_copy<T: Clone>(data: Rc<Vec<T>>) -> VecDeque<T> {
    (*data).clone().into()
}

but you'd have to test the performance.

prataprc · June 11, 2019, 12:52pm

My curiosity got the better of me. Here is the benchmark results.

running 3 tests
test fast_copy1 ... bench:       2,566 ns/iter (+/- 180)                                                                                                                      
test fast_copy2 ... bench:       2,356 ns/iter (+/- 150)                                                                                                                      
test fast_copy3 ... bench:         120 ns/iter (+/- 10)

where T is i32. The third example above has a tight loop copy, while, other solutions seem to break the tight loop with its iteration logic. So as always tight loops wins on x86, or for that matter any stored program architectures.

Ref: rplay/lib.rs at master · prataprc/rplay · GitHub

Yandros · June 11, 2019, 2:20pm

<VecDeque<T> as From<Vec<T>>>::from currently reuses the cloned vec (RawVec) allocation (it just reallocates so that its size reaches a power of 2, so potentially triggering just a single memcpy of [T; len] bytes). And given that Vec::clone is quite cheap when T is Copy and small (such as with integer types), the fact that fast_copy3 may be the fastest is quite plausible.

scottmcm · June 11, 2019, 10:34pm

The magnitude of difference here makes me suspicious of a benchmarking quirk.

Compiling the example, I see

warning: unused variable: `out`
  --> src/lib.rs:36:13
   |
36 |         let out: VecDeque<i32> = (*data).clone().into();
   |             ^^^ help: consider prefixing with an underscore: `_out`

That means the compiler might be smart enough to just take the code out, since it's not black-boxed. That bit of code should be

    b.iter(|| {
        let out: VecDeque<i32> = (*data).clone().into();
        out
    });

so that #[bench] ensures that the result isn't optimized out. (The other ones need similar changes.)

As another minor option to try, since the data is Copy:

#[bench]
fn fast_copy4(b: &mut Bencher) {
    let mut arr = Vec::new();
    arr.resize(1000, 0);
    let data = Rc::new(arr);
    b.iter(|| {
        data.iter().copied().collect::<VecDeque<i32>>()
    });
}

prataprc · June 12, 2019, 5:42am

Did not think about compiler "optimizing out" un-used code. Good point.
I have amended the code with the above recommendation along with fast_copy4 version.

running 4 tests
test fast_copy1 ... bench:       2,519 ns/iter (+/- 254)
test fast_copy2 ... bench:       2,415 ns/iter (+/- 228)
test fast_copy3 ... bench:         119 ns/iter (+/- 13)
test fast_copy4 ... bench:       2,501 ns/iter (+/- 311)

I too feel fast_copy3() is insanely fast compared to other logic.

If I reduce the resize() from 1000 to 100.

test fast_copy1 ... bench:         392 ns/iter (+/- 10)
test fast_copy2 ... bench:         253 ns/iter (+/- 19)
test fast_copy3 ... bench:          81 ns/iter (+/- 2)
test fast_copy4 ... bench:         257 ns/iter (+/- 21)

May be it is the way modern x86 is built, with all its cache-line fetches, cache-prefetches, cold/warm/hot paths etc.. ?

scottmcm · June 12, 2019, 7:24am

Thanks for trying that out.

It looks like it's all down to specialization inside the library in one of the scenarios but not the others:

https://github.com/rust-lang/rust/blob/05083c2dee27/src/liballoc/vec.rs#L1429-L1431

The others ought to be better, since they have perfect length information and the copies can't fail, but apparently that doesn't happen.

system · September 10, 2019, 7:26am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Simplest way to do Vec<&T> -> Vec<T>	7	590	December 20, 2019
Equivalent to C++ std::deque?	7	293	April 16, 2024
Creating an Arc<[T]> from a VecDeque<T> help	8	941	January 12, 2023
How to efficiently overwrite a part of a Vec?	6	1003	July 18, 2023
Vec<T>, &Vec<T> -> ImVec<T>	3	399	September 30, 2019

Copy Rc<Vec<T>> into a VecDeque<T>

Related Topics