Generic method parameter, resticted to numeric types with << support

After my last thread BitSet vs BitVector - #13 by StefanSalewski, I found out that the generic solution using crate BitVec and methods like d.set(h, true); results in very slow performance. Perhaps the BitVec crate is too generic, or I was using it wrong. So I decided to try for learning purposes to implement a custom 64-bit set type. Actually I let GPT-4 do it for me, resulting in code like

#[derive(Clone, Debug)]
struct BitSet(u64);

impl BitSet {
    // Creates a new BitSet with all bits set to 0
    // This is now redundant with the Default implementation, but kept for clarity
    fn new() -> Self {
        BitSet(0)
    }

    // Sets the bit at the specified index to 1
    fn set(&mut self, index: usize) {
        self.0 |= 1 << index;
    }

    fn insert(&mut self, index: usize) {
        self.0 |= 1 << index;
    }

    fn inserti8(&mut self, index: i8) {
        self.0 |= 1 << index;
    }

    // Clears the bit at the specified index (sets to 0)
    fn clear(&mut self, index: usize) {
        self.0 &= !(1 << index);
    }

    fn remove(&mut self, index: usize) {
        self.0 &= !(1 << index);
    }

    fn removei8(&mut self, index: i8) {
        self.0 &= !(1 << index);
    }

    // Checks if the bit at the specified index is set
    fn is_set(&self, index: usize) -> bool {
        (self.0 & (1 << index)) != 0
    }

    fn contains(&self, index: usize) -> bool {
        (self.0 & (1 << index)) != 0
    }

    fn containsi8(&self, index: i8) -> bool {
        (self.0 & (1 << index)) != 0
    }

    // Returns the union of two BitSets (bits that are set in either)
    fn union(&self, other: &BitSet) -> BitSet {
        BitSet(self.0 | other.0)
    }

    // Returns the difference of two BitSets (bits set in self but not in other)
    fn difference(&self, other: &BitSet) -> BitSet {
        BitSet(self.0 & !other.0)
    }

    // Checks if two BitSets are equal
    fn equals(&self, other: &BitSet) -> bool {
        self.0 == other.0
    }

    // Checks if two BitSets are disjoint (no bits in common)
    fn is_disjoint(&self, other: &BitSet) -> bool {
        (self.0 & other.0) == 0
    }
}

impl Default for BitSet {
    fn default() -> Self {
        BitSet(0)
    }
}

GPT-4 is really very smart, and the code is mostly fine, and easy to understand. One issue is, that Rust's shift operator << allows various numeric types including usize, u8, i8, for the places to shift. So one might want to make methods like set() generic, like

    fn insert(&mut self, index: usize) {
        self.0 |= 1 << index;
    }

    fn inserti8(&mut self, index: i8) {
        self.0 |= 1 << index;
    }

That works fine. But as Rust does not support method overloading, we have to use different method names. I tried to use a generic method with the index parameter restricted to numeric parameters that support the << operator. But that is difficult. GPT-4 suggested something like

struct BitSet(u64);

impl BitSet {
    // Allows index to be of any type that can be converted into usize
    fn set<T: Into<usize>>(&mut self, index: T) {
        let index_usize = index.into(); // Convert index to usize
        self.0 |= 1 << index_usize;
    }

    // Other methods...
}

which basically works. But I am unsure if that has an impact on performance, and with that form of generics, it seems to work then for unsigned index values only. As << works with i8 as well, it would be nice to have a generic solution that support i8 as well.

You can use a generic function like this, bounded by the std::ops::Shl trait:

    fn insert<T>(&mut self, index: T)
        where u64: std::ops::Shl<T, Output = u64>
    {
        self.0 |= 1 << index;
    }
2 Likes

Thank you very much for that solution. Actually, I did a lot of Google search already, and I think somewhere I found a warning that the << Shl trait might be also implemented for non-numeric data types like Vec, to support shifting of the vector elements. Might your solution lead to a situation where a user might use a Vec as parameter type instead of an integer type?

You can see all current implementors of Shl in its doc: Shl in std::ops - Rust

Vec implementing it would be very surprising for many users, and I don't think it was considered. But it would be interesting to know more if you can find your sources.

My code only allows types for which u64 implements the Shl trait (that is, types that can appear on the right-hand side of << when a u64 is on the left). It requires that the result is also a u64.

So while other libraries might allow things like let new_vector = old_vector << x;, that won’t affect this code at all.

2 Likes

Note that this doesn’t mean no custom impls could possibly apply. Orphan rules do very well allow impl Shl<LocalType> for ConcreteForeignType, as demonstrated with:

struct OverwriteByShiftOp(u64);

impl core::ops::Shl<OverwriteByShiftOp> for u64 {
    type Output = u64;
    fn shl(self, value: OverwriteByShiftOp) -> u64 {
        value.0
    }
}

fn main() {
    let mut s = BitSet::new();

    s.insert(42_u8);
    println!("{s:#018x?}");
    s.insert(OverwriteByShiftOp(u64::MAX)); // sets ALL the bits at once
    println!("{s:#018x?}");
}

(playground)

BitSet(
    0x0000040000000000,
)
BitSet(
    0xffffffffffffffff,
)
1 Like

Thank you all for the explanations. For the question about Vec shift, I think one of the pages I found yesterday with google was Shl in std::ops - Rust with

An implementation of Shl that spins a vector leftward by a given amount.

But with the explanation of mbrubeck it is clear that that is no problem for my use case.

Two unrelated points:

You didn't ask, but this is much better than BitVec for your use case. :+1:

Making the method generic is fine, but it's also an entirely typical choice in Rust to pick a single index type and use it everywhere. (This sometimes results in a bunch of annoying casts, but often not.)

1 Like

much better than BitVec for your use case.

BitSet vs BitVector - #9 by jdahlstrom pointed me to the bitvec crate, so I tried that. Actually, I just got a private mail from someone asking about the performance of bitvec for my use case, so I have to create a standalone test in the next weeks. Maybe I have used it wrong or not in the best way?

Actually I had the hope that generic bitvec would give nearly optimal performance. The Nim language has a built-in SET/BITSET data type, which indeed gives optimal performance.

Actually I asked for the generic solution mostly, because I an still learning Rust and was not able to find a solution myself.

In your initial post you said that bitvec was slow. Did you use a benchmarking crate like criterion? And did you run with --release? If you didn't do these things, your performance test may have been invalid. It was definitely invalid if you didn't use --release. What type of test did you do?

Oh, now I see from the other topic that you used debug mode for performance testing, which is not meaningful.

Disregarding performance testing, bitvec does not have a 64 bit limit so it has to index into an array before it does the bit shift and get/set operation. That's what makes the implementation discussed here seem more appropriate for your use case.

But speaking of that:

[quote="StefanSalewski, post:1, topic:107448"]
So I decided to try for learning purposes to implement a custom 64-bit set type.
[/quote]

Does this mean you actually don't need more than 64 bits? I assumed that you did not when reading your post. But now I wonder if that's true.

Edit: I see in the other topic that you do only need 64 bits.

Sorry, I did not made a real performance test for my 64-bit fixed size use case yet. Initially, I was glad that I got it working with BitVec crate at all, as that crate is a bit complicated. My observation was, that in debug mode the time after the chess engine did the fist move went from 1.5 sec with BitSet to 2.0 sec with BitVec, and is only 1.3 sec for my custom 64-bit implementation. Of course I know that release mode is much faster generally, and that one should always test in release mode before complaining. I will create a small test program and test BitVec in release mode soon.