SIMD: indexing an shifting

Hello everyone,

I'm using simd, and I'm missing two things from there. I'm not terribly familiar with vectorization, so the question is, can x86 CPUs do this at all:

  • Can I use a combined value for index access into a slice (&[u32]), something like let values: u32x4 = array[indexes]? Indexes are random (and, even worse, they are u32, not usize).
  • Can I shift lanes of a combined value by lanes of another similarly sized combined value, rather than by the same quantity? In other words, I'm looking for impl Shl<u8x4> for u32x4.

For the sake of completeness, here's the problem I'm trying to solve (simplified a little):

  • There's an array of integer indexes (&[u32]) and a bitset (also technically &[u32]).
  • I need to copy the values from indexes into another array which have a corresponding bit set to 1 in the bitset.

So for every index it's:

let (block, bit) = (index / 32, index % 32);
if bitset[block] >> bit & 1 == 1 {
    result[count] = index;
    count += 1;

I was able to vectorize the first line, but it didn't buy me much, as it's looking up into bitset that takes a lot of time. Any ideas on how it might be made more efficiently are welcome! (I'm actually trying to win an informal competition for good measure :slight_smile: )