Set optimised for small integers

My program has a requirement for sets of integers that are usually small ( they are column numbers, and it would be unusual to have a table with more than 63 columns ). I was thinking a specialised set might avoid some heap allocations, and be faster for the usual case. As well as the elements being small integers, typically the set will only contain one or two elements, the main aim is for minimal overhead when creating the set ( they are used when selecting a database index ).

So I just made this SmallSet struct ( code below ). I was just wondering whether to make the overflow an Option<BTreeSet>, so the BTreeSet doesn't have to be initialised if there is no overflow ( the expected common case ). Anyway, is my thinking correct here, and could SmallSet be expected to outperform a plain BTreeSet in this situation? How might the struct be improved?

/// Set of usize, optimised for elements < 64.
pub struct SmallSet {
    bitset: u64,
    overflow: BTreeSet<usize>,
}

impl SmallSet {
    pub fn new() -> Self {
        Self {
            bitset: 0,
            overflow: BTreeSet::new(),
        }
    }
    pub fn insert(&mut self, x: usize) {
        if x < 64 {
            self.bitset |= 1 << x;
        } else {
            self.overflow.insert(x);
        }
    }
    pub fn contains(&self, x: usize) -> bool {
        if x < 64 {
            self.bitset & (1 << x) != 0
        } else {
            self.overflow.contains(&x)
        }
    }

    pub fn remove(&mut self, x: usize) -> bool {
        if x < 64 {
            let bit: u64 = 1 << x;
            let result = self.bitset & bit != 0;
            self.bitset &= u64::MAX - bit;
            result
        } else {
            self.overflow.remove(&x)
        }
    }
}

impl Default for SmallSet {
    fn default() -> Self {
        Self::new()
    }
}

It won't allocate until data is inserted into it

2 Likes

Right. I just checked the implementation of the underlying BTreeMap

    pub const fn new() -> BTreeMap<K, V> {
        BTreeMap { root: None, length: 0 }
    }

So all I would be saving is initialising the length to zero, probably not worth it.

You might consider using an enum instead:

(untested)

pub enum SmallSet<const N:usize> {
    Small([u64;N]),
    Large(BTreeSet<usize>)
}

impl<const N:usize> SmallSet<N> {
    pub fn new() -> Self {
        SmallSet::Small(Default::default())
    }

    pub fn insert(&mut self, x:usize) {
        use Self::*;
        match self {
            Small(_) if (x >> 6) > N => {
                self.make_large().insert(x);
            }
            Small(bits) => {
                bits[x>>6] |= 1 << (x & 0x3F);
            }
            Large(tree) => {
                tree.insert(x);
            }
        }
    }

    fn make_large(&mut self)->&mut BTreeSet<usize> {
        use Self::*;
        match self {
            Small(bits) => todo!(),
            Large(tree) => tree
        }
    }

    pub fn contains(&self, x:usize)->bool {
        use Self::*;
        match self {
            Small(_) if (x >> 6) > N => false,
            Small(bits) => (bits[x>>6] & (1 << (x & 0x3F))) != 0,
            Large(tree) => tree.contains(&x)
        }
    }

    pub fn remove(&mut self, x:usize)->bool {
        use Self::*;
        match self {
            Small(_) if (x >> 6) > N => false,
            Small(bits) => {
                let result = self.contains(x);
                bits[x>>6] &= ~(1 << (x & 0x3F);
                result
            }
            Large(tree) => tree.remove(&x)
        }
    }
}
2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.