Map Rust vector iteration to assembly

EventHelix · September 21, 2024, 3:46pm

Let us look under the hood to understand the assembly code generated for a vector iteration. We will see that the length of a vector is an important factor in vectorizing the iterations. With vectorization, the processor performs multiple operations per instruction. Finally, we will experiment with the Compiler Explorer to see how the compiler unrolls loops and uses vector instructions to improve performance.

We will look at the generated assembly for the following code:

pub fn increment_by(num: i64, list: &mut Vec<i64>) {
    for item in list {
        *item += num;
    }
}

pub fn increment_example(inc_by: i64, array: [i64; 4]) -> Vec<i64> {
    let mut list = array.to_vec();
    increment_by(inc_by, &mut list);
    list
}

scottmcm · September 21, 2024, 8:21pm

This is always better as &mut [i64]. Don't put &mut Vec in parameters like this when you're not changing the length of the Vec.

This problem is why you ended up with the .to_vec() in

pub fn increment_example(inc_by: i64, array: [i64; 4]) -> Vec<i64> {
    let mut list = array.to_vec();
    increment_by(inc_by, &mut list);
    list
}

rather than being able to -> [i64; 4].

Also, this is wrong:

If the memory allocation fails, the function throws an exception using the ud2 instruction.

That's coming from -Z trap-unreachable, and isn't actually hit.

EventHelix · September 22, 2024, 4:46am

The article used a contrived example to illustrate the code generation for &mut Vec<i64>.

You are right that in practice one should use &mut [i64] when the length is not being modified.

I was curious about the difference in the code generation in the two cases. I generated the assembly for the following code:

pub fn increment_by(num: i64, list: &mut Vec<i64>) {
    for item in list {
        *item += num;
    }
}

pub fn increment_example(inc_by: i64, array: [i64; 4]) -> Vec<i64> {
    let mut list = array.to_vec();
    increment_by(inc_by, &mut list);
    list
}

pub fn increment_by_array_slice(num: i64, list: &mut [i64]) {
    for item in list {
        *item += num;
    }
}

pub fn increment_example_array_slice(inc_by: i64, mut array: [i64; 4]) -> [i64; 4] {
    increment_by_array_slice(inc_by, &mut array);
    array
}

Please refer to the Compiler Explorer link.

I have changed the rustc version to 1.81.0 from the 1.66.0 version in the article.

Here are a few observations:

The compiler is now using SIMD regardless of the length of the array.
No ud2 instruction is generated for error handling.
increment_example_array_slice code is a lot simpler than the code generated for increment_example.

In the rustc 1.66.0 version of the code the error handler for memory allocation failure is followed by the ud2 instruction. I found the following discussion on the subject:

system · December 21, 2024, 4:46am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Iterators vs index loops performance help	4	2737	February 28, 2021
Performance questions code review	16	980	September 30, 2020
How to see auto-vectorization in action?	5	746	August 5, 2020
Creating a loop of iterations	7	260	July 9, 2024
Book chapter 8 questions code review	4	744	October 9, 2022

Map Rust vector iteration to assembly

Related topics