Is there a safe way to efficiently remove first N elements in Vec?

I want to remove first N (0 <<< N < length) elements in Vec.

I've already found a solution here. The solution is to use Vec::drain.

However, removing many elements using Vec::drain is quite slow (190x slower than ptr::copy).

https://gist.github.com/Kogia-sima/167ec68c68829fe377be96f56af077a8

How can I "efficiently" remove first N elements in Vec without unsafe?

When I run your code (under Linux on x86_64 with a recent nightly Rust), both benchmarks have the same performance:

test ptr_copy  ... bench:          14 ns/iter (+/- 0)
test vec_drain ... bench:          14 ns/iter (+/- 0)

I'm not sure why my results are different from yours.

Another possible version to try is:

v.rotate_right(COPY_SIZE);
v.truncate(COPY_SIZE);

though on my system, this was slower than the ptr_copy and vec_drain versions.

@mbrubeck Thank you for the quick reply, but that seems strange hmm...

I'm using rustc 1.49.0-nightly (dd7fc54eb 2020-10-15).

Also I tried your suggestion, but It is still slow...

running 3 tests
test ptr_copy     ... bench:          37 ns/iter (+/- 0)
test rotate_right ... bench:         655 ns/iter (+/- 3)
test vec_drain    ... bench:       5,714 ns/iter (+/- 99)

Are you running the exact code shown in the gist? How are you running it? (What rustc options or Cargo profile options are you using?)

The Vec::drain and ptr::copy versions seem to compile to almost identical machine code, when optimizations are enabled:

Isn't this benchmarking code UB, by the way? You're essentially creating a Vec full of uninitialized values and then try to drain elements from it.

I've finally figure out the cause. I was disabling the LTO in my cargo configuration. When I remove lto=false in Cargo.toml, performance has improved.

running 3 tests
test ptr_copy     ... bench:          28 ns/iter (+/- 0)
test rotate_right ... bench:         932 ns/iter (+/- 49)
test vec_drain    ... bench:          28 ns/iter (+/- 0)

It definitely is. I wondered if maybe LVMM exploited that in one of the cases. I see from the reply just now that doesn't seem to be the case, but you couuld just use vec![42; TOTAL_SIZE], since it's out of your iter anyway.

@quinedot @Cerberuser Thank you for your additional notes. It's true the benchmark contains undefined behaviour. I've written the benchmark code only for this question. Actually I didn't care much about UB.

Well, in fact, for benchmarks it is as important as for any other code. Probably this is the simplest illustration: if some function invokes UB, then it can be compiled to literally anything, including no-op - and there's little reason to benchmark possible no-ops.

2 Likes

@Cerberuser Ok, I got it. I'll be careful from the next time.