How can i get the fastest hashset?

hello everyone,
is it possible to create a sized hashset/hashmap to store my data?
the data max size can be specified, and it needs a very very fast speed to get/delete/insert.

but the default hashmap/hashset is a heap allocation object ( as i known, stack is always faster than heap ).

so i want to ask if there is a sized hashset/hashmap existed ?

You might be putting the cart before the horse here.

A hashset uses heap allocations, yes, but these are ammortized so you only re-allocate infrequently and after that inserting is just a case of finding the right bucket and copying the value across.

The concern people have with things like hashsets isn't that they use dynamic allocations, but that they are designed to spread accesses out across the entire backing buffer instead of putting similar items next to each other. This can have an effect on performance because your CPU cache likes sequential accesses while random access means you'll get cache misses and need to make a trip to main memory more frequently.

If what you are actually looking for is a fast way to read/insert/delete items into a collection with no duplicates then rule #1 is to write benchmarks using typical data from your application and get numbers for the various alternatives.

You should also get numbers that show your bottleneck is actually with the hashset and not somewhere else... it'll be annoying to spend hours tweaking hashset lookups when your actual problem is a bad algorithm makes you do O(n3) lookups when O(n) was sufficient.

10 Likes

AFAIK, the amount of data that can be stored on the stack is limited by the stack size, so whether the original idea is feasible may depend on the total size of the structure.

Also, if your data is all known at compile time and read-only, you can use something like the phf crate to create a hashset/map which is perfectly laid out so there are no collisions.

5 Likes

Another common concern is the hashing speed, but that can be addressed by using an alternate hashing algorithm with the standard HashMap / HashSet. Unfortunately, I don't know the favored alternatives off the top of my head.

Like the heap allocations, though, this is only a practical issue in a few edge cases-- Best to profile first and only spend effort fixing actual issues.

3 Likes

it's not a ready-only object, i just know the max size of the data.
and the question should be the how to get the fastest key-value object in rust lol

yeah, i use it in my trade program, it needs fast speed . actually, is there a another key-value data-structure existed?

In the standard library you've also got BTreeMap.

Like @2e71828 mentioned you can swap out the hashing algorithm to something that doesn't try to avoid DOS situations - I know the Rust compiler uses the fxhash crate internally for exactly this reason.

Also, depending on how much data you are storing inside the set/map, a legitimate alternative is to keep things in a Vec and do a linear scan to find the right item. Bjarne Stroustrup did a talk at one CppCon investigating the relative performance of different collections, but I can't remember the title.

2 Likes

now i am using ahash. maybe i need to write a benchmark to test it . thanks for your reply.

If you need the absolutely fastest speed, then general advice can only get you so far. The performance of each option will depend on the workload you're putting on the collection, and the only way to eke out the last few percent of performance is to try everything and benchmark it with as close to a real situation as you can manage. Criterion is a good way to start writing high-quality benchmarks.

2 Likes

Is there anything specific you can tell us about the data? There might be some common pattern or data access that you can take advantage of by creating your own data structure.

https://github.com/HFQR/RemClient/blob/db4a40a49f2da81a7045733dbe90476287383a94/src/rem.rs#L129

hey, see the temp_info , this is the client project for my trade program, the low-level rem does not open source, but you can see the the data type in here.

and if you have any idea, please contact me with somewheve@gmail.com

I was a bit surprised not to see nohash_hasher mentioned here yet. I'm not sure what keys you're working with, but if they can reasonably be kept to 64 bytes or less (maybe [char; 16]?) you could forego hashing computations entirely. Having your data on the heap may not be the biggest bottleneck here as the Rust compiler is pretty good at optimizing.

Then, depending on your data size and amount, you could also look into storage using arrayvec for a very limited, but potentially entirely stack-allocated option.

Further, if the keys could be defined at compile time, you could use an enum for those and I'm pretty sure that could be made to work with nohash_hasher.

2 Likes

hey, i would take look at it. and make a benchmark.
thanks for your advice, lol

If you know the max size in advance and don't want any heap-related slowdown during use, the fastest way is to initialize your set with HashSet::with_capacity(max_elements). Now there will be only 1 allocation at start, and your insertions and deletions will never reallocate. It'll be as fast as if it was on the stack !
(But that shouldn't stop you from also applying other optimizations hinted to in this thread)

1 Like

lol, always pursue the fatser speed !

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.