I want to remove first N (0 <<< N < length) elements in Vec.
I've already found a solution here . The solution is to use Vec::drain
.
However, removing many elements using Vec::drain
is quite slow (190x slower than ptr::copy).
https://gist.github.com/Kogia-sima/167ec68c68829fe377be96f56af077a8
How can I "efficiently" remove first N elements in Vec without unsafe?
When I run your code (under Linux on x86_64 with a recent nightly Rust), both benchmarks have the same performance:
test ptr_copy ... bench: 14 ns/iter (+/- 0)
test vec_drain ... bench: 14 ns/iter (+/- 0)
I'm not sure why my results are different from yours.
Another possible version to try is:
v.rotate_right(COPY_SIZE);
v.truncate(COPY_SIZE);
though on my system, this was slower than the ptr_copy
and vec_drain
versions.
@mbrubeck Thank you for the quick reply, but that seems strange hmm...
I'm using rustc 1.49.0-nightly (dd7fc54eb 2020-10-15).
Also I tried your suggestion, but It is still slow...
running 3 tests
test ptr_copy ... bench: 37 ns/iter (+/- 0)
test rotate_right ... bench: 655 ns/iter (+/- 3)
test vec_drain ... bench: 5,714 ns/iter (+/- 99)
Are you running the exact code shown in the gist? How are you running it? (What rustc options or Cargo profile options are you using?)
The Vec::drain
and ptr::copy
versions seem to compile to almost identical machine code, when optimizations are enabled:
type Item = u8;
pub fn remove_drain(v: &mut Vec, n: usize) {
v.drain(..n);
}
pub fn remove_copy(v: &mut Vec, n: usize) {
unsafe {
std::ptr::drop_in_place(&mut v[n..]);
let src = v.as_ptr().add(n);
let dst...
Isn't this benchmarking code UB, by the way? You're essentially creating a Vec
full of uninitialized values and then try to drain elements from it.
I've finally figure out the cause. I was disabling the LTO in my cargo configuration. When I remove lto=false
in Cargo.toml, performance has improved.
running 3 tests
test ptr_copy ... bench: 28 ns/iter (+/- 0)
test rotate_right ... bench: 932 ns/iter (+/- 49)
test vec_drain ... bench: 28 ns/iter (+/- 0)
It definitely is. I wondered if maybe LVMM exploited that in one of the cases. I see from the reply just now that doesn't seem to be the case, but you couuld just use vec![42; TOTAL_SIZE]
, since it's out of your iter
anyway.
@quinedot @Cerber-Ursi Thank you for your additional notes. It's true the benchmark contains undefined behaviour. I've written the benchmark code only for this question. Actually I didn't care much about UB.
Well, in fact, for benchmarks it is as important as for any other code. Probably this is the simplest illustration: if some function invokes UB, then it can be compiled to literally anything, including no-op - and there's little reason to benchmark possible no-ops.
2 Likes
@Cerber-Ursi Ok, I got it. I'll be careful from the next time.
system
Closed
January 30, 2021, 6:10am
11
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.