What would be the best way to use OR command on part of a BitVec?
For example, assume the BitVec size is 8 * 128 = 1024 and for simplicity let just mark is as [0,1,2,3,4,5,6,7] where each slice is 128 bits.
The new BitVec after the operation will be half size and will look like [0 | 1, 2 | 3, 4 | 5, 6 | 7]. with length of 512.
I would like this to be as efficient as possible.
Which BitVec implementation do you want to use? I believe the bitvec
crate would make this fairly easy, while the bit_vec
crate requires a bit of unsafe code to do this most efficiently.
Here is an implementation using the bit_vec
crate:
use bit_vec::BitVec;
pub fn or_parts(v: &BitVec<u32>) -> BitVec<u32> {
assert_eq!(v.len(), 1024);
let mut result = BitVec::from_elem(512, false);
// SAFETY: We do not change the length or capacity of `new_storage`.
let new_storage = unsafe { result.storage_mut() };
let old_storage = v.storage();
for (new_chunk, old_chunk) in new_storage.chunks_mut(4).zip(old_storage.chunks(8)) {
// These are u32 values, so four of them makes 128 bits.
new_chunk[0] = old_chunk[0] | old_chunk[4];
new_chunk[1] = old_chunk[1] | old_chunk[5];
new_chunk[2] = old_chunk[2] | old_chunk[6];
new_chunk[3] = old_chunk[3] | old_chunk[7];
}
result
}
For the bitvec
crate you would use as_raw_slice
and as_raw_mut_slice
in place of storage
and storage_mut
.
Using just the standard library and doing it one u64
at a time (so that LLVM doesn't have to emulate u128
) results in vectorized code: Godbolt
pub fn bit_or_1024_interleaved(input: [u64; 16]) -> [u64; 8] {
[
input[ 0] | input[ 2],
input[ 1] | input[ 3],
input[ 4] | input[ 6],
input[ 5] | input[ 7],
input[ 8] | input[10],
input[ 9] | input[11],
input[12] | input[14],
input[13] | input[15],
]
}
translates to the following assembly:
example::bit_or_1024_interleaved::h51d608ebb3573aa2:
mov rax, rdi
vmovups xmm0, xmmword ptr [rsi + 16]
vorps xmm0, xmm0, xmmword ptr [rsi]
vmovups xmmword ptr [rdi], xmm0
vmovups xmm0, xmmword ptr [rsi + 48]
vorps xmm0, xmm0, xmmword ptr [rsi + 32]
vmovups xmmword ptr [rdi + 16], xmm0
vmovups xmm0, xmmword ptr [rsi + 80]
vorps xmm0, xmm0, xmmword ptr [rsi + 64]
vmovups xmmword ptr [rdi + 32], xmm0
vmovups xmm0, xmmword ptr [rsi + 112]
vorps xmm0, xmm0, xmmword ptr [rsi + 96]
vmovups xmmword ptr [rdi + 48], xmm0
ret
1 Like
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.