ArrayVec Optimizations

I'm having some trouble with what seem to be flaky optimizations when using the arrayvec crate. I recognize this forum may not be the best place to ask about a specific crate, but I'm more interested about what's going on with the optimizer.

Consider this example:

ArrayVec::from_iter(std::iter::repeat(x).take(32))
ArrayVec::from_iter(std::iter::repeat_n(x, 32))
ArrayVec::from_iter((0..32).map(|_i| x))

When x is of type (&i32, &i32, &i32), all 3 of these approaches generate different assembly. The first two versions don't take advantage of SSE, the second version also appears to check whether the first reference is null (?) and if so, sets the length to 0 (??), and the third version is... bizarre to say the least.

(3rd version description)

If the null check passes, it writes copies of the first two pointers into a stack array, then copies that array to the output; if the null check fails, it sets the output length to 0 and copies uninitialized stack data into the output; in either case, the 3rd pointer is copied correctly).

This is not a problem when creating an ArrayVec of a simple type like i32, only structs or tuples of a few elements, perhaps too large to be passed via registers in the calling convention.

Godbolt link

What is causing this to be poorly optimized? How can I work around this?

1 Like

The same behaviour apprears when collecting into Box<[_]>.

I suspect the reason are the additional annotations that pointer like values get together with beeing in a struct/tuple. A tuple of usize will optimize nicely while collecting tuples of pointers/references does not vectorize. However, an array [*const (); 3] does vectorize.

1 Like