[show] bitwise crate 0.1.0 🎉!


I’ve just published the first version of the bitwise crate :tada: !

In a nutshell, it relies on the bitintr crate to implement higher-level bitwise manipulation algorithms. Currently it only implement algorithms for single words (words slices will come).

There are many algorithms already. If I had to mention just one, it would be the Morton Index Encoding/Decoding algorithm, since it is the fastest implementation of it that I’ve found in native Rust. It uses BMI2.0 instructions (pdep/pext) when available on the target (Intel Haswell / AVX2 and up) and a cache friendly “magicbits” method otherwise. Other algorithms are also available with different trade-offs, e.g.,. the LUT algorithm uses more cache to load a look-up table but can be faster in some platforms, and I’ll probably add sLUT (skewed LUT) at some point with uses even more cache and can be even faster. They are all still slower than using BMI 2.0 though.

Do you like manipulating bits? Do you think code that does is magical? Or maybe you like solving puzzles, playing games (sudoku, chess, go…), or squeezing out the maximum performance out of your CPU?

Well I have good news for you!

The issue tracker has a Roadmap to 1.0 which mentions a lot of fun to implement algorithms, most of which have a link to a description or other implementations! The acknowledgements section in the readme also mentions lots of sources (Hackers Delight :heart_eyes:) with many more algorithms that are really fun to implement!

And there are also the main things missing (API wise) that I want to get done before 1.0:

  • support stable Rust (haven’t figured out a way to avoid specialization here),
  • implement SWAR (Simd Within A Register) algorithms for the common operations [*],
  • implement algorithms on word slices by combining the simd and the bitintr crates,
  • support 128 bit integers.

Pull-requests are really appreciated!

[*] Think of SWAR as storing e.g. 4xu16 inside a single u64, and wanting to do “element-wise” arithmetic/bitwise operations (addition/substraction/…bitwise xor/… with wrapping/saturating/… arithmetic…) and reductions.



Nice job ! I’m wondering about the use of u32 at many places in the API. Why not usize or even u8 when it can fit (e.g. for bit shifting) ?


Where is u32 used?

EDIT: it seems that travis-cargo stopped uploading new versions of the docs (but still showing green builds) a long long time ago…

You can read the latest docs in docs.rs.

EDIT2: restarting the build does not seem to help, and I am using 1:1 the same file as in my bitintr crate which just works there… Ah… travis-ci issues, FUN!


Here: https://gnzlbg.github.io/bitwise/bitwise/trait.Word.html

    fn shift_logical_left(self, n: u32) -> Self;
    fn shift_logical_right(self, n: u32) -> Self;
    fn shift_arithmetic_left(self, n: u32) -> Self;
    fn shift_arithmetic_right(self, n: u32) -> Self;
    fn rotate_left(self, n: u32) -> Self;
    fn rotate_right(self, n: u32) -> Self;


Yes that website points to outdated docs (~7 months old). The up-to-date docs are in docs.rs: https://docs.rs/bitwise/

I am still trying to figure out why travis-cargo fails to update the gh-pages, it seems related to the github token, but in my other crates the same token works just fine :confused:


Oh nice ! Sorry I didn’t read your post fully :slight_smile:.



So here’s a use case I have:

I want to alter the jump target of an x86 instruction, e.g.:


The target is in bytes 2-5 (indexing from 0), 0x200c22

Currently I represent this as a fixed array and use my scroll crate to write into the bits but I was thinking of switching to simple numeric manipulation and then write the number as bits (with whatever endianness I needed).

So: would it be easy for me to insert a little endian number in the byte range using your crate or is it not something currently supported?


You can use the parallel bits deposit algorithm (which does an efficient
parallel bits left deposit on haswell in just 4 cycles) to overwrite those
bytes with whatever you want.

I want to implement a bit range replace algorithm analogous to bit range
extract at some point, but that’s low priority because that’s just a small
ergonomic improvement over parallel bits deposit if you are on a modern CPU.

For something older than haswell parallel bits deposit can be very slow. A
bit range replace algorithm might be much faster, but… custom code might
be even faster anyways… If you need this let me know (with a more
concrete example with exactly what do you want to replace with what) and I
can research your options a bit more :slight_smile:


Morton coding is cool. Will you consider adding other space-filing curves, like Hilbert or Peano? They are supposed to have even better properties for spatial indexing (see links).



If someone is willing to send a PR I will gladly accept it and help get it into good shape :slight_smile:

I have some nice bit manipulation algorithms for hilbert curves somewhere, maybe I can clean those up, but I don’t use Hilbert curves anymore in practice.