Performance penalty for never used enum variant

I have a use case where I store a HashMap<3rd_party_checksum_of_data, data>. I then did some modifications:

  1. Because the checksums can collide, I replaced data with HashSet<data>.
  2. To prevent heap allocations for the default (non-colliding) case, I added the following type: enum CollisionData { One(data), Multi(HashSet<data>) } and replaced HashSet<data> with CollisionData. When adding a new entry to the map and an entry with One(data) already exists, it will be replaced with Multi(HashSet { old_data, new_data }) (pseudo init code).
  3. To just optionally enable the "multiple results on collision" feature, I added a bool switch, which, if set to false, causes the One(data) -> Multi(HashSet { old_data, new_data }) code path never to be called, and thus the HashSet and CollisionData::Multi variant never to be constructed.

After those three changes, the type declaration is the following: HashMap<3rd_party_checksum_of_data, CollisionData>. But now the program is a lot slower in comparison to the state before the changes, even if the switch is set to false.

I created the following test program, which never constructs CollisionData::Multi, to mimic the problem (in this example, 3rd_party_checksum_of_data is just a counter from 0..10_000 and data is the same value):

enum CollisionData<T> {
    One(usize),
    Multi(T)
}

// type Entry = usize; // 218 - 228 microseconds.
// type Entry = CollisionData<()>; // 248 - 265 microseconds.
// type Entry = CollisionData<usize>; // 247 - 258 microseconds.
// type Entry = CollisionData<std::collections::HashSet<usize>>; // 600 - 634 microseconds.

type Map = std::collections::HashMap<usize, Entry>;

#[inline(never)]
fn create_map(entries: usize) -> Map {
    let mut map = Map::with_capacity(entries);

    for i in 0..entries {
        // map.entry(i).or_insert(i);
        // map.entry(i).or_insert(Data::One(i));
        // map.entry(i).or_insert(Data::One(i));
        // map.entry(i).or_insert(Data::One(i));
    }

    map
}

fn main() {
    const LOOPS: usize = 10_000;

    let start = std::time::Instant::now();

    for _ in 0..LOOPS {
        #[allow(unused)]
        let map = create_map(LOOPS);
    }

    println!("{}", start.elapsed().as_micros() / LOOPS as u128);
}

I added the runtime from my machine as comments in the code. One can see that the last variant with the HashSet takes more than double the amount of time than the other variants. In my real-world use case, this problem, with the switch set to false, causes the runtime to increase from 30 seconds to 25 minutes, even though the CollisionData::Multi variant is never constructed.

I had hoped that using the bool switch and never constructing a HashSet would not be so expensive! Am I doing something wrong? I could use a generic type which uses either data or HashSet<data>, but that would complicate the code a lot.

Why don't use something like HashMap<checksum, Vec<data>> rather ? This way you don't store the collisions in the key but in in the value, which makes more sense and is very ergonomic.

I wasn't able to reproduce the numbers you're seeing (the difference is there but a lot smaller on my PC), but the size of CollisionData<usize> is 16 bytes while CollisionData<HashSet<usize>> is 56 bytes*, regardless of which variants are used. Thus the unused variant does have a relatively significant impact on the type as a whole.

An increase from seconds to minutes sounds like something else might be going on, but either way depending on your use-case you may want to consider a feature that enables the variant instead of a bool.

*checked with size_of in std::mem - Rust

3 Likes

Maybe I'm misinterpreting this:

  1. I don't want data duplicates, that is why I use a HashSet instead of a Vec
  2. I don't use a raw HashSet (or your Vec tip) but the CollisionData enum, to not have heap allocations in the default case of 1 element (modification #2 in the original post)

Maybe the growth in size is enough to cause the issue with a large enough data set. I will embed data in a 56 byte type and report back.
The switch is set from a command-line argument, a compile time feature would not be ideal.

But you are already storing duplicates in your hashset. A (this) hashset doesn't just store the hash.

If you don't want to allocate in the case of only one T, you can use HashMap<K, smallvec::SmallVec<T, 1>>, that's a much simpler design. And if you really only want to count collision, just add a counter to your struct, either independent from the map, or in the map like so HashMap<K, (T, usize)>.

1 Like

This seems to have been the cause. Storing a separate Vec<HashSet<data>>, and only storing the index into that vector in CollisionData::Multi brings the runtime down to 30 seconds again, even with the switch set to true (storing the collisions). Thanks for the hint!

1 Like

You are most likely right and it is too late for me to wrap my head around this. I will come back to your suggestion tomorrow. Thanks already!

New day, new me, but I still don't understand your suggestion..
I am storing duplicate keys for different data entries, but the data entries must not be duplicates.
As far as I understand it, the SmallVec would cause the same size issue that was mentioned by Heliozoa.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.