Why is setting the value of something to the cloned value of something else faster than doing mem::replace?

Hello,
In a little experiment I ran, found here, I was surprised to realize that setting an array element this way:

the_array[indx] = value_to_set_indx_to.clone();

is faster then

mem::replace(&mut the_array[indx], value_to_set_indx_to);

Doesn't the second way avoid the extra clone, or is something being optimized out in my toy example?

OK, update. First of all, the new link is here, because the old one took too long to execute. Second of all, the second approach now IS faster, by quite a lot. For sequential access, however, the second was slower. For random access and modify, the second is faster. Any ideas?

Even on your initial attempt, I got these results:

Mem::replace based approach took 2657210
Clone based approach took 2983794

I'm pretty sure that 2983794 > 2657210.. Also, this makes sense, the clone approach has to copy the buffer each time.


On a second attempt I got:

Mem::replace based approach took 3652959
Clone based approach took 3020418

Hmm, mixed results. Time for a bit of number crunching!
Here is the test I ran. I ran it on my pc, as it would've taken ages on the playground, and even then took alot of my cpu:


I guess I should've optimized beforehand! Anyway, the results to that are:

On average clone took: 9968
The median clone took: 9910
On average mmcpy took: 7280
The median mmcpy took: 7276

So essentially they are the same clone is slower.
(Note that the above results are in milliseconds)

Edit: Whoops I misread my results, they are actually not the same.

1 Like

I think it's tough to extrapolate from any result you're getting here, since a sufficiently-smart compiler could notice that you're just overwriting the_vec[1] every time (and not ever using the value), and remove most of the code. (The only stuff that really needs to stay is the random number generation.)

Consider making an example that actually uses the output, if you want meaningful measurements.

Wow, just saw the code, and there are so many other stuff you are doing on top of your write operation: you are not doing x.clone() but X::new().clone();

  • (the cloning may be optimized away, although unlikely),

  • but more importantly: X::new() does push into an empty vec 500000 times. This means countless reallocations. I am pretty sure this is what is bottlenecking the whole operation, which leads to random timings due to the abusive heap usage.

You should definitely clean that before benchmarking!

This means creating the Vec using Vec::with_capacity(500000) instead of Vec::new(). Now, this is not DRY, since you get two times a magic numbers. You thus then need to set it within a constant.

More generally, though, collecting should be preferred to mutation (pushing) when building a Vec (or any kind of collection) since it will do the right preallocations for you:

For instance, instead of doing

let mut v: Vec<usize> = Vec::new();
for _ in 0 .. 500000 {
    v.push(rng.gen());
}

you can do

let v: Vec<usize> =
    (0 .. 500000)
        .map(|_| rng.gen())
        .collect();
1 Like