I’m researching the topic of copying stuff in memory from one location to another. Trying to figure out what “zero-cost abstractions” mean in this context and when do they break. Compare these three ways of copying a vector:
fn copy_iter(src: &[u8], dst: &mut Vec<u8>) {
for &byte in src {
dst.push(byte);
}
}
fn copy_index(src: &[u8], dst: &mut Vec<u8>) {
let mut index = 0;
unsafe { dst.set_len(SIZE); }
while index < src.len() {
dst[index] = src[index];
index += 1;
}
}
fn copy_push_all(src: &[u8], dst: &mut Vec<u8>) {
dst.push_all(src);
}
On my machine, on 10M vector they run approximately in:
- iter: 0.010
- index: 0.007
- push_all: 0.002
I’m trying to figure out the reason for such a big difference. My ideas so far:
-
push_all
is the fastest because it forgoes any border checking during the loop using unsafe access to pointers -
index
does index check on bothsrc[index]
anddst[index]
-
iter
, on everyVec::push
also does an extra shift of the base pointer to current length
Questions:
- Where am I wrong?
- As far as I understand the concept of zero-cost abstractions, the iterator solution, ideally, should be possible to be made as fast as the index-based one. Or not?
- Why
Vec::push_all
doesset_len()
on every iteration instead of settings it once at the end? - What’s the theoretically fastest way of copying things in memory? My current understanding is that the vectorized loop is as fast as it gets, but I admittedly don’t know very much about it.
May be there’s some blog post about it already. If not, I’d like to write one!