Advice regarding vector memory efficiency

Hi all,

I am writing a domain-specific database module that is bottlenecked by main memory usage, and which uses vectors extensively as a cache layer. My problem reduces to the following:

I have a vector of bytes that could potentially occupy all of the program's address space if I let it. Bytes are inserted into the vector following non-random, yet differing, distributions (potentially sparse, very concentrated, or in intervals of concentration) depending on the particular caller of the module.

I want to minimize memory usage at all times. For example, if I used the standard vector, it would be very bad if the caller decided to insert their first element into a very large index, as it would presumably cause the vector to allocate a lot of unnecessary capacity.

On the other hand, if I use a sparse vector implementation (like keeping the data and indices in two separate vectors), eventually there will be enough data to where it would be more efficient to just store the data in a regular vector (specially considering the indices would take at least a few bytes to store each).

I also thought of the interval case, and of having a vector of vectors with "islands" (or high-density intervals) of data, and another vector indicating the starting indices of those "islands." However, I would not know how to measure exactly when this technique would start and stop being efficient in terms of my data's index distribution.

Does anyone have any pointers about this kind of problem, or maybe even a crate that solves this altogether in an almighty vector implementation that dynamically switches across these when it's most efficient to (implemented by someone much smarter than me)?

Space performance is paramount, but time efficiency not so much (IF it can be mitigated with multithreading).

Wouldn't a high-fanout, shallow tree ordered by "index" be better? Those usually allow O(log n) lookup, insertion and deletion, while maintaining cache locality. (It's no accident most databases will eventually end up using trees in various parts of their implementation.)

2 Likes

Thanks for the suggestion, but since I am storing bytes, the amount of memory it will take to store the addresses of subsequent tree nodes will negate its performance/density benefits. So I think the overhead will be prohibitively high.

Maybe this isn't a big sacrifice in big databases because they only use them to look for pages in indices, and not for specific table keys for example. But please correct me if I'm wrong, I would love for something like this to work out as a solution.

It's worth noting that letting go of the "individually address each byte" requirement isn't that easy to let go of, because the bytes themselves are already bit packed.

Is it possible to make a "double indexing" - tree itself stores data in indexed blocks (something like kilobytes in size), and each block is then indexed to bytes?

3 Likes

I like the idea, but I think that the minima of the "cost function" (namely, average data structure usage/capacity) at the argmin in terms of block size would still be a bit high.

E.g., if we use 2^16 byte blocks (which already start sacrificing the benefits of small size), then we would need to store 2^16 (potentially) 16-byte memory addresses for each one, assuming a 32-bit address space for a vector of equivalent max capacity -- so the tree solution would use as much space as a (2^16 * 2^16 * 4 = 2^34) 34 bit address space-d vector, or in other words, 4 times as much memory.

In the domain this would be used in, this is prohibitive. I think that the solution I need most likely has the shape of some kind of hybrid data structure that switches modes at certain boundaries (so sacrificing time instead of space -- I forgot to mention the possibility of this running quite a few threads, so parallelism is something that is greatly taken advantage of in this particular project).

I am also thinking about high fanout tree structures, something like a page table used by CPUs. it is constant access, can relatively efficiently store sparse data, but cache locality is not good. each level is accessed by shifting and masking the logical index.

sorry I am not able to explain it better, if you know the data structure used by the vector type in Clojure, you know what I am talking about. the Clojure vec type is immutable however, but the data structure is not restricted.

I don't really follow this. The tree solution only need to store block_size + pointer_size * 3 bytes per block, no? Assuming block_size = 1024, pointer_size = 8, It's efficiency is block_size / block_size + pointer_size * 3 = 1024 / 1048 = 97.7%.

3 Likes

For illustration, you could base your implementation on something like this:

use std::collections::BTreeMap;
use std::ops::{Index,IndexMut};

const BLOCK_MASK: usize = 0xfff;

#[derive(Default)]
/// A sparse, addressable sequence of bytes that initializes to all-zeros
pub struct ByteSpace(BTreeMap<usize, [u8; BLOCK_MASK+1]>);

impl Index<usize> for ByteSpace {
    type Output = u8;
    fn index(&self, idx:usize)->&u8 {
        let hi = idx & !BLOCK_MASK;
        let lo = idx & BLOCK_MASK;
        let Some(block) = self.0.get(&hi) else { return &0 };
        &block[lo]
    }
}

impl IndexMut<usize> for ByteSpace {
    fn index_mut(&mut self, idx:usize)->&mut u8 {
        let hi = idx & !BLOCK_MASK;
        let lo = idx & BLOCK_MASK;
        &mut self.0.entry(hi).or_insert([0;BLOCK_MASK+1])[lo]
    }
}
2 Likes

Can't you simply mmap memory for you vector without pre-populating it and let MMU do its job? It will be very fast, and physical memory would be allocated only on demand. It could be quite inefficient for very sparse loads (one vector entry would take a whole page), but otherwise it should be one of the fastest and simplest solutions.

It's possible to achieve the same effect by carefully using Vec, so it would use allocate_zeroed and grow_zeroed under the hood, but it's probably better to be explicit with such code.

Sorry, I was sleepy and wasn't thinking straight -- this sounds about right, which means that this is very viable. Thanks so much!

This is very nice, thank you!

Thank you @2e71828, @zirconium-n, and @Cerber-Ursi. This is what I ended up doing, in case it is useful to someone:

struct Page(BTreeMap<usize, [u8; BLOCK_MASK + 1]>);

impl Page {
    fn new() -> Self {
        Page(BTreeMap::new())
    }

    fn get(&self, index: usize) -> Option<u8> {
        let upper = index & !BLOCK_MASK;
        let lower = index & BLOCK_MASK;
        if let Some(block) = self.0.get(&upper) {
            Some(block[lower])
        } else {
            None
        }
    }

    fn get_mut(&mut self, index: usize) -> Option<&mut u8> {
        let upper = index & !BLOCK_MASK;
        let lower = index & BLOCK_MASK;
        if let Some(block) = self.0.get_mut(&upper) {
            Some(&mut block[lower])
        } else {
            None
        }
    }

    fn insert(&mut self, index: usize, value: u8) {
        let upper = index & !BLOCK_MASK;
        let lower = index & BLOCK_MASK;
        match self.0.get_mut(&upper) {
            Some(mut block) => {
                block[lower] = value;
            }
            None => {
                let mut new_block = [0; BLOCK_MASK + 1];
                new_block[lower] = value;
                self.0.insert(upper, new_block);
            }
        }
    }
}
2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.