Pearl: Extending a `Vec` via `append` or `extend`?

When combining Vecs, I find myself flipping between Vec::append and Extend::extend without any clear motivated preference for one or the other.

If we look at Vec's implementation of Extend, we see:

fn extend<I: IntoIterator<Item = T>>(&mut self, iter: I) {
    <Self as SpecExtend<T, I::IntoIter>>::spec_extend(self, iter.into_iter())
}

where this leads is a bit tricky to follow, but it's somewhere in this file in std. Either way, because this operates over an Iterator, is it adding the new elements one-by-one, memory layout be damned?

append on the other hand is straight-forward:

pub fn append(&mut self, other: &mut Self) {
    unsafe {
        self.append_elements(other.as_slice() as _);
        other.set_len(0);
    }
}

/// Appends elements to `Self` from other buffer.
unsafe fn append_elements(&mut self, other: *const [T]) {
    let count = unsafe { (*other).len() };
    self.reserve(count);
    let len = self.len();
    unsafe { ptr::copy_nonoverlapping(other as *const T, self.as_mut_ptr().add(len), count) };
    self.len += count;
}

This looks to me like it's physically moving the memory from the other to the self. And if the self had been preallocated with extra room via with_capacity, then this would be more efficient, no?

Thanks for any wisdom you can offer.

We can ask godbolt about it!

pub fn append_via_extend(dest: &mut Vec<String>, src: Vec<String>) {
    // To clean up the output a bit, assume we already reserved enough space
    assert!(dest.capacity() - dest.len() >= src.len());

    dest.extend(src.into_iter());
}

https://rust.godbolt.org/z/EhnK5z1q7

example::append_via_extend:
        push    r15
        push    r14
        push    r13
        push    r12
        push    rbx
        mov     rax, qword ptr [rdi + 8]
        mov     r12, qword ptr [rdi + 16]
        mov     rbx, qword ptr [rsi + 16]
        mov     r15, rsi
        sub     rax, r12
        cmp     rax, rbx
        jb      .LBB1_1
        mov     r13, rdi
        lea     rdi, [r12 + 2*r12]
        mov     r14, qword ptr [r15]
        lea     rax, [8*rbx]
        mov     r15, qword ptr [r15 + 8]
        shl     rdi, 3
        add     rdi, qword ptr [r13]
        lea     rdx, [rax + 2*rax]
        mov     rsi, r14
        call    qword ptr [rip + memcpy@GOTPCREL]  // <-- Oh look!
        add     rbx, r12
        mov     qword ptr [r13 + 16], rbx
        test    r15, r15
        je      .LBB1_5
        mov     ecx, 24
        mov     rax, r15
        mul     rcx
        test    rax, rax
        je      .LBB1_5
        mov     edx, 8
        mov     rdi, r14
        mov     rsi, rax
        pop     rbx
        pop     r12
        pop     r13
        pop     r14
        pop     r15
        jmp     qword ptr [rip + __rust_dealloc@GOTPCREL]
.LBB1_5:
        pop     rbx
        pop     r12
        pop     r13
        pop     r14
        pop     r15
        ret
.LBB1_1:
        lea     rdi, [rip + .L__unnamed_1]
        lea     rdx, [rip + .L__unnamed_2]
        mov     esi, 59
        call    qword ptr [rip + core::panicking::panic@GOTPCREL]
        ud2
        mov     rbx, rax
        mov     rdi, r15
        call    core::ptr::drop_in_place<alloc::vec::Vec<alloc::string::String>>
        mov     rdi, rbx
        call    _Unwind_Resume@PLT
        ud2
        call    qword ptr [rip + core::panicking::panic_no_unwind@GOTPCREL]
        ud2

You'll notice those jumps are forward -- so there's no loop. But there is a call to memcpy!

So between the specializations and optimizations, it noticed that it's getting data from contiguous memory, and is copying it the best way it knows how.

1 Like

I didn't know about godbolt, thanks for that. Does that mean that append is still better?

They should be equivalent; if you look at the very bottom of spec_extend.rs, you'll see that extending from a slice::Iter just calls append_elements.

3 Likes

What you're seeing in the extend impl is "specialization" (an unstable feature) the standard library is using to make extend essentially an alias for append; the performance should be the same, thus the difference you should analyze is which is easier to understand in your codebase? Note that append works explicitly with Vecs, whereas extend works with any Iterator.

2 Likes

I prefer append just because it's more obvious that it will be simple, without requiring any heavy lifting by the compiler. It's like I'm doing rustc a favor, even though it's plenty smart enough anyway.

2 Likes

Thanks folks.

I think the most important part is to use the one where the object ownership does what you needed.

If append works and you want to re-use the original Vec's buffer, then use it -- I think append is better than .extend(vec.drain()).

But if you just want to give away ownership of the whole thing and don't care, use .extend(vec) because that communicates it well. (extend takes IntoIterator, so that's the same as .extend(vec.into_iter()).) And trust that std/llvm will probably do something reasonable -- which we saw here it does -- and if not it'll come up in your perf tests if it really matters.

3 Likes

Btw that's not the one used when extending a Vec with another Vec. The right one is a couple of lines before, though that also uses append_elements.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.