Reading Vec<u32> values slower than writting

Hello!

I am trying to write a fast lookup table for 7-cards poker hand evaluation.

The principle is very simple: given a hand, represented as a u64, compute a lookup index (perfect hash function), and fetch the pre-computed hand value stored in a Vec. There are about 133M possible hand combinations.

My problem is that reading values from the Vec is much slower than writing them. I'd like to know wether that's to be expected, and/or if there's anything I can do to speed it up, given that it's actually the performance-critical part of the code.

Here are the timings I get:

  • Computing all 133M indices: 180M hands / sec
  • Computing all 133M indices + writing lookup table values: 150M hands / sec
  • Computing all 133M indices + reading lookup table values: 34M hands / sec

and here is the relevant part of the code

show code
struct LookupTable {
    lookup: Vec<u32>,
    hash: PerfectHash,
}

impl LookupTable {
    fn new() -> Self {
        let hash = PerfectHash::new();
        let lookup = vec![0; LOOKUP_SIZE];
        Self { lookup, hash }
    }

    // iterate over all possible hands and compute their index (perfect hashing)
    fn compute_all_indices(&mut self) -> usize {
        let mut k = 0;
        for hand in generate_hands() {
            k += self.hash.index(hand);
        }
        k
    }

    // fill the lookup table with arbitrary value
    fn write_values(&mut self) {
        for hand in generate_hands() {
            let index = self.hash.index(hand);
            self.lookup[index] = 1;
        }
    }

    // lookup values for all possible hands
    fn read_values(&self) -> u32 {
        let mut z = 0;
        for hand in generate_hands() {
            let index = self.hash.index(hand);
            z += self.lookup[index];
        }
        z
    }
}

fn main() {
    let mut table = LookupTable::new();

    let start = Instant::now();
    let result = table.compute_all_indices();
    let duration = start.elapsed().as_micros() as usize;
    println!(
        "Perfect hash computing: {:?} M/s ({})",
        LOOKUP_SIZE / duration,
        result
    );

    let start = Instant::now();
    table.write_values();
    let duration = start.elapsed().as_micros() as usize;
    println!("Filling lookup table: {:?} M/s", LOOKUP_SIZE / duration);

    let start = Instant::now();
    let value = table.read_values();
    let duration = start.elapsed().as_micros() as usize;
    println!(
        "Fetching lookup table: {:?} M/s ({})",
        LOOKUP_SIZE / duration,
        value
    );
}

Thanks :slight_smile:

TL;DR: you “problem” is deeply fundamental, not Rust-related and can not be “fixed” at Rust language level… but there are mitigation techniques.

Translation from English to English: there are a lot of data and it doesn't fin in the cache.

Which is not even remotely surprising when there are this much data.

Yes. It is expected.

There are a lot of things you can do. But given the fact that you are dealing not with something Rust-specific, but something fundamental to the computer design… recommendation would be to read some book which talks about memory, caches, prefetching and other such things.

Drepper wrote an excellent piece of the subject and there are lots and lots of other books on subject.

I don't think it would be fruitful to discuss possible solution till you understand the basic things about how contemporary computer work.

1 Like

Thank you for your answer and for the recommendation, I will definitely give it a read !

Just keep in mind that you may not like Drepper but maybe can find some other article or book which describes these things and outlines mitigation techniques. Keywords: memory, cache, prefetch.

I haven't seen anything where examples were in Rust, but, as I have already said, it's not Rust-related at all: C or Fortran, Pascal or C#, Java or even JavaScript — you may hit that problem in any language and mitigation techniques would be, similarly, not language-specific.

1 Like

Another set of keywords to look up literature on the subject:

Sequential write versus random read

This series has a bunch of good graphs of how reading a random element out of N possibilities is actually more like O(√N) than the O(1) you learn in CS 101: The Myth of RAM, part I

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.