Run OR command on parts of a BitSet

What would be the best way to use OR command on part of a BitVec?
For example, assume the BitVec size is 8 * 128 = 1024 and for simplicity let just mark is as [0,1,2,3,4,5,6,7] where each slice is 128 bits.
The new BitVec after the operation will be half size and will look like [0 | 1, 2 | 3, 4 | 5, 6 | 7]. with length of 512.
I would like this to be as efficient as possible.

Which BitVec implementation do you want to use? I believe the bitvec crate would make this fairly easy, while the bit_vec crate requires a bit of unsafe code to do this most efficiently.

Here is an implementation using the bit_vec crate:

use bit_vec::BitVec;

pub fn or_parts(v: &BitVec<u32>) -> BitVec<u32> {
    assert_eq!(v.len(), 1024);
    let mut result = BitVec::from_elem(512, false);

    // SAFETY: We do not change the length or capacity of `new_storage`.    
    let new_storage = unsafe { result.storage_mut() };
    let old_storage = v.storage();
    
    for (new_chunk, old_chunk) in new_storage.chunks_mut(4).zip(old_storage.chunks(8)) {
        // These are u32 values, so four of them makes 128 bits. 
        new_chunk[0] = old_chunk[0] | old_chunk[4];
        new_chunk[1] = old_chunk[1] | old_chunk[5];
        new_chunk[2] = old_chunk[2] | old_chunk[6];
        new_chunk[3] = old_chunk[3] | old_chunk[7];
    }
    result
}

For the bitvec crate you would use as_raw_slice and as_raw_mut_slice in place of storage and storage_mut.

Using just the standard library and doing it one u64 at a time (so that LLVM doesn't have to emulate u128) results in vectorized code: Godbolt

pub fn bit_or_1024_interleaved(input: [u64; 16]) -> [u64; 8] {
    [
        input[ 0] | input[ 2],
        input[ 1] | input[ 3],
        input[ 4] | input[ 6],
        input[ 5] | input[ 7],
        input[ 8] | input[10],
        input[ 9] | input[11],
        input[12] | input[14],
        input[13] | input[15],
    ]
}

translates to the following assembly:

example::bit_or_1024_interleaved::h51d608ebb3573aa2:
  mov rax, rdi
  vmovups xmm0, xmmword ptr [rsi + 16]
  vorps xmm0, xmm0, xmmword ptr [rsi]
  vmovups xmmword ptr [rdi], xmm0
  vmovups xmm0, xmmword ptr [rsi + 48]
  vorps xmm0, xmm0, xmmword ptr [rsi + 32]
  vmovups xmmword ptr [rdi + 16], xmm0
  vmovups xmm0, xmmword ptr [rsi + 80]
  vorps xmm0, xmm0, xmmword ptr [rsi + 64]
  vmovups xmmword ptr [rdi + 32], xmm0
  vmovups xmm0, xmmword ptr [rsi + 112]
  vorps xmm0, xmm0, xmmword ptr [rsi + 96]
  vmovups xmmword ptr [rdi + 48], xmm0
  ret
1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.