How can i get the fastest hashset?

somewheve · October 15, 2021, 8:27am

hello everyone,
is it possible to create a sized hashset/hashmap to store my data?
the data max size can be specified, and it needs a very very fast speed to get/delete/insert.

but the default hashmap/hashset is a heap allocation object ( as i known, stack is always faster than heap ).

so i want to ask if there is a sized hashset/hashmap existed ？

Michael-F-Bryan · October 15, 2021, 8:43am

You might be putting the cart before the horse here.

A hashset uses heap allocations, yes, but these are ammortized so you only re-allocate infrequently and after that inserting is just a case of finding the right bucket and copying the value across.

The concern people have with things like hashsets isn't that they use dynamic allocations, but that they are designed to spread accesses out across the entire backing buffer instead of putting similar items next to each other. This can have an effect on performance because your CPU cache likes sequential accesses while random access means you'll get cache misses and need to make a trip to main memory more frequently.

If what you are actually looking for is a fast way to read/insert/delete items into a collection with no duplicates then rule #1 is to write benchmarks using typical data from your application and get numbers for the various alternatives.

You should also get numbers that show your bottleneck is actually with the hashset and not somewhere else... it'll be annoying to spend hours tweaking hashset lookups when your actual problem is a bad algorithm makes you do O(n³) lookups when O(n) was sufficient.

jbe · October 15, 2021, 8:46am

AFAIK, the amount of data that can be stored on the stack is limited by the stack size, so whether the original idea is feasible may depend on the total size of the structure.

Michael-F-Bryan · October 15, 2021, 8:48am

Also, if your data is all known at compile time and read-only, you can use something like the phf crate to create a hashset/map which is perfectly laid out so there are no collisions.

2e71828 · October 15, 2021, 8:49am

Another common concern is the hashing speed, but that can be addressed by using an alternate hashing algorithm with the standard HashMap / HashSet. Unfortunately, I don't know the favored alternatives off the top of my head.

Like the heap allocations, though, this is only a practical issue in a few edge cases-- Best to profile first and only spend effort fixing actual issues.

somewheve · October 15, 2021, 8:50am

it's not a ready-only object, i just know the max size of the data.
and the question should be the how to get the fastest key-value object in rust lol

somewheve · October 15, 2021, 8:55am

yeah, i use it in my trade program, it needs fast speed . actually, is there a another key-value data-structure existed?

Michael-F-Bryan · October 15, 2021, 9:07am

In the standard library you've also got BTreeMap.

Like @2e71828 mentioned you can swap out the hashing algorithm to something that doesn't try to avoid DOS situations - I know the Rust compiler uses the fxhash crate internally for exactly this reason.

Also, depending on how much data you are storing inside the set/map, a legitimate alternative is to keep things in a Vec and do a linear scan to find the right item. Bjarne Stroustrup did a talk at one CppCon investigating the relative performance of different collections, but I can't remember the title.

somewheve · October 15, 2021, 9:10am

now i am using ahash. maybe i need to write a benchmark to test it . thanks for your reply.

2e71828 · October 15, 2021, 9:13am

If you need the absolutely fastest speed, then general advice can only get you so far. The performance of each option will depend on the workload you're putting on the collection, and the only way to eke out the last few percent of performance is to try everything and benchmark it with as close to a real situation as you can manage. Criterion is a good way to start writing high-quality benchmarks.

Michael-F-Bryan · October 15, 2021, 9:15am

Is there anything specific you can tell us about the data? There might be some common pattern or data access that you can take advantage of by creating your own data structure.

somewheve · October 15, 2021, 9:19am

https://github.com/HFQR/RemClient/blob/db4a40a49f2da81a7045733dbe90476287383a94/src/rem.rs#L129

hey, see the temp_info , this is the client project for my trade program, the low-level rem does not open source, but you can see the the data type in here.

and if you have any idea, please contact me with somewheve@gmail.com

PFaas · October 15, 2021, 10:14am

I was a bit surprised not to see nohash_hasher mentioned here yet. I'm not sure what keys you're working with, but if they can reasonably be kept to 64 bytes or less (maybe [char; 16]?) you could forego hashing computations entirely. Having your data on the heap may not be the biggest bottleneck here as the Rust compiler is pretty good at optimizing.

Then, depending on your data size and amount, you could also look into storage using arrayvec for a very limited, but potentially entirely stack-allocated option.

Further, if the keys could be defined at compile time, you could use an enum for those and I'm pretty sure that could be made to work with nohash_hasher.

somewheve · October 15, 2021, 2:47pm

hey, i would take look at it. and make a benchmark.
thanks for your advice, lol

bestouff · October 15, 2021, 7:33pm

If you know the max size in advance and don't want any heap-related slowdown during use, the fastest way is to initialize your set with HashSet::with_capacity(max_elements). Now there will be only 1 allocation at start, and your insertions and deletions will never reallocate. It'll be as fast as if it was on the stack !
(But that shouldn't stop you from also applying other optimizations hinted to in this thread)

somewheve · October 16, 2021, 4:58am

lol, always pursue the fatser speed !

system · January 14, 2022, 4:58am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Recommendations for reducing memory footprint of numerous mostly-small HashMaps? help	14	4157	January 12, 2023
What's the fastest way to store 300 million unique values in a HashSet? help	40	1883	August 5, 2023
HashMap for a set that is known ahead of time help	10	1163	January 14, 2022
Announcing tinyset announcements	5	1309	January 12, 2023
Fixed size hash set	2	803	July 24, 2020

How can i get the fastest hashset?

Related Topics