Unnecessary stack usage?

newpavlov · September 13, 2021, 4:50am

I have the following simple SIMD-powered function which tests whether points are inside one of bounding boxes:

pub unsafe fn foo(
    x: &[__m256i; N],
    y: &[__m256i; N],
    z: &[__m256i; N],
    bboxes: &[[__m256i; 6]],
) -> [__m256i; N] {
    let mut res = [_mm256_setzero_si256(); N];
    for bbox in bboxes {
        for i in 0..N {
            let tx = _mm256_and_si256(
                _mm256_cmpgt_epi32(x[i], bbox[0]),
                _mm256_cmpgt_epi32(bbox[1], x[i]),
            );
            let ty = _mm256_and_si256(
                _mm256_cmpgt_epi32(y[i], bbox[2]),
                _mm256_cmpgt_epi32(bbox[3], y[i]),
            );
            let t = _mm256_and_si256(tx, ty);
            let tz = _mm256_and_si256(
                _mm256_cmpgt_epi32(z[i], bbox[4]),
                _mm256_cmpgt_epi32(bbox[5], z[i]),
            );
            let t = _mm256_and_si256(t, tz);
            res[i] = _mm256_or_si256(res[i], t);
        }
    }
    res
}

By inspecting the generated assembly we can see that for some reason it caches coordinates to stack and reads them from it each iteration instead of using the input pointers. The same behavior can be observed for a function which processes coordinate slices. This caching looks quite redundant to me, especially considering that noalias is enabled (i.e. compiler should know that memory at which coordinates are stored can not change during function execution). Is there a reason for this behavior which I don't see or is it simply an LLVM quirk which produces sub-optimal results?

newpavlov · September 14, 2021, 11:28am

For those who are interested I've created this issue in the Rust repo: Unnecessary stack usage · Issue #88930 · rust-lang/rust · GitHub

system · December 13, 2021, 11:29am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Stack usage of arrays Embedded	4	514	June 16, 2021
Unnecessary performance penalty for mem::MaybeUninit?	27	1585	March 21, 2023
Eliminating redundant bounds checks on read+write mutable slices help	5	263	December 19, 2023
JIT/stack protection woes help	8	1959	January 12, 2023
Will Box::new() copy memory? help	6	3754	January 12, 2023

Unnecessary stack usage?

Related Topics