Storing large items in Map collections

Hi,

I've tried Googling/DDGing/Searching this forum for this, and haven't really found an authoritative answer, and the docs for most of the Map collections don't seem to obviously mention internal information about things like reallocation, detailed storage info, so apologies if the information is somewhere obvious and I haven't found it.

Does anyone have information to guide me on some of the common Map collections (BTreeMap, HashMap, IndexMap, SlotMap, etc - not sure which one I'll use yet, it sort of depends) in terms of how they handle allocating/reallocating/copying large(ish - say + 1400 bytes) V values, and if they take copies / realloc when inserting new or deleting existing items, or if they Box internally on the heap per-item (or node?) in a way which would not copy anyway?

I'm going to look into and test/benchmark this myself in the absence of any info (which is always good practice for specific workloads!), but I was wondering whether it would be better to Box the values first in some of these Maps to possibly prevent any value copying / reallocation of the large values, and if anyone had any previous knowledge in this area to help guide my initial approach.

i.e. might it be better to use HashMap<Box<Value>>, rather than HashMap<Value> directly (and for other Map collections) to reduce copying overhead, even if that introduces an extra pointer indirection, or would some of the Map collections not require that due to the way they work internally?

The use-case is a map with items that can be added and removed quite regularly, and at this point I don't (yet) want to have to go to the trouble of keeping a side-by-side Vec and SlotMap (say) in sync if there's no need to do that (I certainly don't want to be keeping indices up-to-date, but the SlotMap would help with that).

Thanks.

1 Like

I suggest not trying to answer this question at first since it will likely be answered by factors other than performance as you solidify your design and implementation.

  • Start with BtreeMap if you need ordered keys, or a HashMap if you don't or you don't know yet. Only use a SlotMap if you need its features, which is something you'll find out quickly enough if you don't already know.
  • Use unboxed values at first. The need for boxed values may arise if the values turn out to have variable sizes, or you end up having to share individual values among data structures or threads.

After you've solidified the design and have something implemented, you can fairly easily change the type of map and add boxing based on performance considerations, and these decisions should involve benchmarking of a real world workload. Rust is very nice for refactoring, because the strict type system makes it much less likely that changes will break things.

3 Likes

Sorry, been away...

Thanks, yeah, I guess I'm getting too far ahead of myself, I was just curious if there was an obvious path to avoid up-front because it would suffer from lots of copying...

I'll just go with BTreeMap for the moment and do some profiling once I get more of the infrastructure written.

Thanks again...

1 Like