What's the most memory efficient way to store a sparse Vec (while preserving the addresses!) without sacrificing a significant amount access speed?

Since you mentioned the main reason to avoid the stdlib HashMap is performance (if I understod correctly), have you tried the crate HashBrown?

https://github.com/Amanieu/hashbrown

It's as easy as a drop-in replacement for the std HashMap and HashSet implementation but i uses a faster hashing algorithm. The difference can be quite substantial.

@cfsamson The problem here is that @ndrewxie doesn't need to use a hashing algorithm. The index should just be a passthrough.

A hashtable is fundamentally just "how do we fit bigger indexes into a smaller vec", though. So a hashtable -- of some sort -- is absolutely the correct thing for this situation.

1 Like

Of course, that's what I've been suggesting since the start, it's just that there's no need to actually hash the index because it is just a number, not a string which needs to be converted to a number, not a struct or enum or whatever, so it can be just passed through. The hasher in a hashtable will procure a u64 and therefore std::mem::size_of::<usize>() <= std::mem::size_of::<u64>()

So, something like (playground with full implementation)

struct IdentityHasher(usize);

impl Hasher for IdentityHasher {
    fn finish(&self) -> u64 {
        // TODO: static_assert!(sizeof(usize) <= sizeof(u64)), or ignore the truncation
        self.0 as u64
    }

    fn write(&mut self, _bytes: &[u8]) {
        unimplemented!("IdentityHasher only supports usize keys")
    }

    fn write_usize(&mut self, i: usize) {
        self.0 = i;
    }
}
2 Likes

Yep!

This looks like an opportunity to promote my crate RleVec. It provides a Vec-like interface but uses run length encoding to store the data. There is an O(log n) lookup price though.

https://crates.io/crates/rle_vec

You can use my sivec crate. You probably shouldn't: it's pretty untested, full of unsafe code, and likely pretty slow. But you could.

This crate provides a SIVec "self-initializing vector" type that retains indices and has O(1) initialization and O(1) access. On the other hand it's gross to use (because IndexMut is just not a good fit for this and other indexing problems) and uses address space proportional to the maximum index.

Still, can't hurt to try it out I guess.