I'm using a HashMap
to take inventory of unique occurrences of [u8]
and counting them - for all practical purposes, a word frequency count.
match counter.get_mut(bytes) {
Some(count) => {
*count += 1;
}
None => {
counter.insert(bytes.to_owned(), 1);
}
}
To help scream through the data, I have an index that indicates the offset in a Mmap
such that index and (index + 1) lookup the two offsets that make-up the subslice of Mmap
. I skip some constant number in the index, lookup the values at (index) and (index + 1) -> subslice of Mmap
and so on. In the event the value of the slice is not already in the counter, only then, do I copy the bytes.
The question is, given that all I need is to establish equality with a pre-existing key (using the HashMap
), what is the very minimum required to "read-in" the subslice to perform this lookup? For instance, do I have to create and manage a buffer? At the level of the CPU I have to load the bytes of that subslice into a register, but what does that (load and compare) translate to?
Thanks to anyone with a mania and expertise for speed reading.
- E