I have a use case where I store a HashMap<3rd_party_checksum_of_data, data>
. I then did some modifications:
- Because the checksums can collide, I replaced
data
withHashSet<data>
. - To prevent heap allocations for the default (non-colliding) case, I added the following type:
enum CollisionData { One(data), Multi(HashSet<data>) }
and replacedHashSet<data>
withCollisionData
. When adding a new entry to the map and an entry withOne(data)
already exists, it will be replaced withMulti(HashSet { old_data, new_data })
(pseudo init code). - To just optionally enable the "multiple results on collision" feature, I added a
bool
switch, which, if set tofalse
, causes theOne(data) -> Multi(HashSet { old_data, new_data })
code path never to be called, and thus theHashSet
andCollisionData::Multi
variant never to be constructed.
After those three changes, the type declaration is the following: HashMap<3rd_party_checksum_of_data, CollisionData>
. But now the program is a lot slower in comparison to the state before the changes, even if the switch is set to false
.
I created the following test program, which never constructs CollisionData::Multi
, to mimic the problem (in this example, 3rd_party_checksum_of_data
is just a counter from 0..10_000
and data
is the same value):
enum CollisionData<T> {
One(usize),
Multi(T)
}
// type Entry = usize; // 218 - 228 microseconds.
// type Entry = CollisionData<()>; // 248 - 265 microseconds.
// type Entry = CollisionData<usize>; // 247 - 258 microseconds.
// type Entry = CollisionData<std::collections::HashSet<usize>>; // 600 - 634 microseconds.
type Map = std::collections::HashMap<usize, Entry>;
#[inline(never)]
fn create_map(entries: usize) -> Map {
let mut map = Map::with_capacity(entries);
for i in 0..entries {
// map.entry(i).or_insert(i);
// map.entry(i).or_insert(Data::One(i));
// map.entry(i).or_insert(Data::One(i));
// map.entry(i).or_insert(Data::One(i));
}
map
}
fn main() {
const LOOPS: usize = 10_000;
let start = std::time::Instant::now();
for _ in 0..LOOPS {
#[allow(unused)]
let map = create_map(LOOPS);
}
println!("{}", start.elapsed().as_micros() / LOOPS as u128);
}
I added the runtime from my machine as comments in the code. One can see that the last variant with the HashSet
takes more than double the amount of time than the other variants. In my real-world use case, this problem, with the switch set to false
, causes the runtime to increase from 30 seconds to 25 minutes, even though the CollisionData::Multi
variant is never constructed.
I had hoped that using the bool
switch and never constructing a HashSet
would not be so expensive! Am I doing something wrong? I could use a generic type which uses either data
or HashSet<data>
, but that would complicate the code a lot.