HashMap performance

carlomilanesi · July 10, 2016, 8:48pm

This issue is about optimizing the performance of counting the occurrences of instances of a collection of u32 values.
Using the Rust standard library, such feature may be implemented as a map to a counter. The Rust standard library has two kinds of maps: std::collections::HashMap and std::collections::BTreeMap.

Using the first one, I wrote the following Rust program:

fn main() {
    let mut m = std::collections::HashMap::<u32, u32>::new();
    for i in 0..200_000_000 {
        *m.entry(i % 100_000).or_insert(0) += 1;
    }
    println!("{}", m[&40000]);
}

It prints “2000”.
If “HashMap” is replaced by “BTreeMap”, an equivalent program is obtained.
Then I wrote the following equivalent C++ program, that uses also a GCC-only library extension, contained in the "__gnu_pbds" namespace:

#include <iostream>
#include <ctime>
#include <map>
#include <unordered_map>
#include <ext/pb_ds/assoc_container.hpp>
using namespace std;

template<typename T>
void f() {
    clock_t start = clock(); 
    T m;
    for (int i = 0; i < 200000000; ++i) {
        auto j = i % 100000;
        auto pos = m.find(j);
        if (pos == m.end()) m[j] = 1;
        else ++pos->second;
    }
    cout << m[40000] << " " << float(clock() - start)
        / CLOCKS_PER_SEC << " s" << endl;    
}
    
int main() {
    f<map<unsigned int, unsigned int>>();
    f<unordered_map<unsigned int, unsigned int>>();
    f<__gnu_pbds::cc_hash_table<unsigned int, unsigned int>>();
}

The running times that I got in one computer are the following ones:

std::collections::HashMap: 8.9 seconds;
std::collections::BTreeMap: 12.6 seconds;
std::map: 20.2 seconds;
std::unordered_map: 2.61 seconds;
__gnu_pbds::cc_hash_table: 0.83 seconds.

It appears that the Rust program using BTreeMap outperforms the C++ program using map (that uses a red-black tree), but it appears also that the fastest Rust program, that is the one that uses HashMap, takes more than 3 times the time taken by the program using unordered_map, and more than 10 times the time taken by the program using cc_hash_table.
To demonstrate that it is not a defect in the test program but an inefficiency of the library, I encapsulated the C++ collections in a C++ library, and then called such library from a Rust program using FFI.
Here is the C++ source code of the library:

#include <unordered_map>
#include <ext/pb_ds/assoc_container.hpp>
using namespace std;
static unordered_map<unsigned int, unsigned int> um;
static __gnu_pbds::cc_hash_table<unsigned int, unsigned int> ccht;
extern "C" void um_create() { }
extern "C" void ccht_create() { }
extern "C" void um_destroy() { um.clear(); }
extern "C" void ccht_destroy() { ccht.clear(); }
extern "C" void um_increment_count(unsigned int i) {
    try {
        auto pos = um.find(i);
        if (pos == um.end()) um[i] = 1;
        else ++pos->second;
    }
    catch (...) { }
}
extern "C" void ccht_increment_count(unsigned int i) {
    try {
        auto pos = ccht.find(i);
        if (pos == ccht.end()) ccht[i] = 1;
        else ++pos->second;
    }
    catch (...) { }
}
extern "C" unsigned int um_get_count(unsigned int i) {
    try {
        return um[i];
    }
    catch (...) { }
    return 0;
}
extern "C" unsigned int ccht_get_count(unsigned int i) {
    try {
        return ccht[i];
    }
    catch (...) { }
    return 0;
}

Here is the Rust source code of the program using such library:

use std::time::Instant;
#[link(name = "stdc++", kind = "static")]
#[link(name = "hash", kind = "static")]
extern {
    fn um_create();
    fn ccht_create();
    fn um_destroy();
    fn ccht_destroy();
    fn um_increment_count(i: i32);
    fn ccht_increment_count(i: i32);
    fn um_get_count(i: u32) -> i32;
    fn ccht_get_count(i: u32) -> i32;
}

fn main() {
    let t0 = Instant::now();
    unsafe {
        um_create();
        for i in 0..200_000_000 {
            um_increment_count(i % 100_000);
        }
        println!("{}", um_get_count(40000));
        um_destroy();
    }
    let d1 = t0.elapsed();
    println!("{}", d1.as_secs() as f64 + d1.subsec_nanos() as f64 / 1e9);

    let t1 = Instant::now();
    unsafe {
        ccht_create();
        for i in 0..200_000_000 {
            ccht_increment_count(i % 100_000);
        }
        println!("{}", ccht_get_count(40000));
        ccht_destroy();
    }
    let d2 = t1.elapsed();
    println!("{}", d2.as_secs() as f64 + d2.subsec_nanos() as f64 / 1e9);
}

And here is the Bash script to build the library and the program, and to run the generated program:

g++ -std=c++11 -fPIC -c -O3 hash2.cpp -o hash2.o && \
ar rcs libhash.a hash2.o && \
rustc -O -L. hash2.rs && \
./hash2

The resulting times are:

std::unordered_map: 2.69 seconds;
__gnu_pbds::cc_hash_table: 0.83 seconds.

The resulting times are the same of the C++-only program.
I don’t know why the HashMap collection of the Rust standard library is performing so poorly, compared to map, and why map is performing so poorly, compared to cc_hash_table, but I think these differences such be seriously taken into account to improve the performance of the HashMap collection, or to provide an alternative Rust collection.

matklad · July 10, 2016, 9:00pm

My understanding is that by default Rust uses a pretty slow hash function which provides good protection from DOS attacks. More about it in the FAQ: https://www.rust-lang.org/en-US/faq.html#why-are-rusts-hashmaps-slow.

It would be interesting to know how ::std::collections::HashMap and unordered_map compare for the same hash function.

matklad · July 10, 2016, 9:10pm

And I'd like to add that BTreeMap is faster then map mainly because STL requirements for map forbid to implement it as a tree with a large branching factor. So we can't compare Rust and C++ here, because they have different APIs. And while we are at it, here is an awesome long read about Rust BTreeMap: Rust Collections Case Study: BTreeMap

leonardo · July 10, 2016, 9:29pm

Using a safe hashing on default is acceptable, but Rust std library needs to offer a really simple to use and compact way to say "don't use the safe hashing in this case".

sorear · July 10, 2016, 11:10pm

Do benchmark FnvHasher and see if it's fast enough for your use cases, but there's more to hash table performance than just the hash function. Your hash table is 2MB (131,072 slots (the smallest power of 2 greater than 100k * 11 / 10) * (8 byte hash value + 4 byte key + 4 byte data), and Rust's hash tables lay out the hashes, the keys, and the data in 3 separate subarrays so every lookup requires a minimum of 3 L2 cache misses (assuming your L2 cache is smaller than 2MB).

withoutboats · July 11, 2016, 7:59am

You can already swap out the hashing algorithm rather easily. Do you mean you think there should be a second hasher in the standard library?

arthurprs · July 11, 2016, 8:20am

@carlomilanesi what's your rust version?

leonardo · July 11, 2016, 8:54am

There is a recent improvement of hashing:

github.com/rust-lang/rust

hashmap: use siphash-1-3 as default hasher

rust-lang:master ← seanmonstar:siphash-1-3

opened 05:38AM - 29 May 16 UTC

seanmonstar

+336 -77

Also exposes `SipHash13` and `SipHash24` in `core::hash::sip`, for those that wa…nt to differentiate. For motivation, see this quote from the original issue: > we proposed SipHash-2-4 as a (strong) PRF/MAC, and so far no attack whatsoever has been found, > although many competent people tried to break it. However, fewer rounds may be sufficient and I would > be very surprised if SipHash-1-3 introduced weaknesses for hash tables. This keeps a type alias of `SipHasher` to `SipHash24`, and since the internal default hasher of HashMap is specified as "not specified", changing it should not be a breaking change. Closes #29754

And there is another significant improvement that's possible:

After those two changes Rust hashing could be "fast enough". But when I have tried it, swapping out hashing strategies in Rust has being rather laborious, badly documented and not easy for me (and I was trying to hash just a tuple of two f64, I think).

arthurprs · July 11, 2016, 9:08am

Here're are my results with latest nightly. Simple in this case is the same strategy used by unordered_map, just return the integer. We're actually faster.

Code: hash.rs · GitHub

2000
SIP13 Duration { secs: 6, nanos: 709803720 }
2000
SIP24 Duration { secs: 7, nanos: 654518957 }
2000
FNV Duration { secs: 5, nanos: 179056737 }
2000
Simple Duration { secs: 0, nanos: 897325662 }
➜  temp ./a.out 
2000 map 18.505 s
2000 unordered 1.9361 s

withoutboats · July 11, 2016, 9:25am

It seems like you mean you were trying to implement a special hasher? This is a rather different problem from trying to swap out the hashing algorithm in a hash map.

If you have a hashing algorithm implemented using the interface defined in std, all you have to do to use it with HashMap is construct the map with the HashMap::with_hasher constructor.

arthurprs · July 11, 2016, 9:26am

C++ does even more trickery with it's templates and for integers it won't even store the hash, making it more concise in memory.

leonardo · July 11, 2016, 9:35am

The final purpose was to have a fast hash map where the keys were a pair of f64 values. So I needed to do both things.

apasel422 · July 11, 2016, 12:35pm

As a side note, rustc -O corresponds to rustc -C opt-level=2. It likely won't have much of an impact on this code, but you may want to experiment with -C opt-level=3.

withoutboats · July 11, 2016, 6:39pm

Just out of curiousity, what was the challenge about defining an implementation of the Hasher trait?

leonardo · July 11, 2016, 7:09pm

Take a look at the "simple" module code in the link above:

That's very bad, and that's the basic version. If you want a smarter hashing and you have some f64s, it gets worse.

withoutboats · July 11, 2016, 7:55pm

I don't see what is so bad about that. You have to define default, write, and finish. These seem like the basic building blocks of a hasher, I don't see how they could be made simpler. What practical improvement to this API would you make?

leonardo · July 11, 2016, 8:18pm

I am not able to use that API, it's too much complex and hard to use for me, it's too much badly documented, and I think it requires too much user code.

withoutboats · July 11, 2016, 8:29pm

This feedback isn't very easy to put into action. What user code do you think is required, but superfluous?

The simple hasher in your example has code to convert a byte array into a u64; I think that should be a function in the standard library, but its not really related to hashing per se. This doesn't apply to your example of converting a (f64, f64) tuple for example.

leonardo · July 11, 2016, 8:37pm

Yes, let's drop this discussion, I'll just keep using the default hashing for now.

But in the Rust docs I'd like (much) better documentation of how to write code like that, with few complete long examples. How to swap out the DOS-attacks-resistant way with something terribly DOS-unsafe but with reasonable performance, how to hash all the built in types (including the floating points ones), and so on.

matklad · July 11, 2016, 8:59pm

I sympathize with you that HashMap documentation is a bit light on the default hashing algorithm used. It would be nice if it pointed to some alternative Hasher implementations, like rust-fnv. Though it probably shouldn't focus on how to implement Hasher.

Topic		Replies	Views
Hash-rs needs cpus and algorithms!	19	2596	January 12, 2023
Building hash map is cheaper than doing lookups? help	20	1414	May 4, 2021
Inexplicable nondeterministic behavior in scientific computing code help	32	1000	July 7, 2024
FxHash blog by Nicholas Nethercote "A brutally effective hash function in Rust"	20	5913	March 9, 2022
Fastest StrMap? help	16	3268	September 28, 2019

HashMap performance

Related topics