SIMD: indexing an shifting


#1

Hello everyone,

I’m using simd, and I’m missing two things from there. I’m not terribly familiar with vectorization, so the question is, can x86 CPUs do this at all:

  • Can I use a combined value for index access into a slice (&[u32]), something like let values: u32x4 = array[indexes]? Indexes are random (and, even worse, they are u32, not usize).
  • Can I shift lanes of a combined value by lanes of another similarly sized combined value, rather than by the same quantity? In other words, I’m looking for impl Shl<u8x4> for u32x4.

For the sake of completeness, here’s the problem I’m trying to solve (simplified a little):

  • There’s an array of integer indexes (&[u32]) and a bitset (also technically &[u32]).
  • I need to copy the values from indexes into another array which have a corresponding bit set to 1 in the bitset.

So for every index it’s:

let (block, bit) = (index / 32, index % 32);
if bitset[block] >> bit & 1 == 1 {
    result[count] = index;
    count += 1;
}

I was able to vectorize the first line, but it didn’t buy me much, as it’s looking up into bitset that takes a lot of time. Any ideas on how it might be made more efficiently are welcome! (I’m actually trying to win an informal competition for good measure :slight_smile: )