Announcing tinyset


#1

Introducing tinyset, a create with size optimized sets.

I’ve been working on developing an improved set type, and now have something that others may find useful. TinySet is a variant of HashSet which is optimized for the case of a set of small keys such as integers. In this case hashing is fast (with fnv), and storing both keys and hashes of keys is a waste of space. In addition, to save space we specialize on the case where you can define a key that is invalid, typically the maximum value of the integral type. With these tweaks, which I define in the HasInvalid trait, we can have a set type that is about as fast as fnv::HashMap, but takes a lot less memory.

For u32 keys, we have three times less memory used for large sets, and about 20 times less memory used for sets of up to 4 elements, with no heap allocation for sets in that size range.

The API is not yet complete, and I will welcome contributions, feature requests, or bug reports!


#2

How many bytes on a 64 bit system for 4 elements?


#3

24 bytes to hold 4 u32 elements.

Any set of elements that fits in 16 bytes will not require a heap allocation, and will take 24 bytes on the stack (or wherever you put your TinySet).


#4

Announcing the release tinyset 0.0.10. This is a significant step in tinyset's development, which marks the addition of a very cool new set: Set64. This set allows you to efficiently store small integers of any integer type. Or for that matter any other type for which you can define a conversion to and from u64. There is no longer a need (for your sets) to choose u32 or u16 types just to save space. Storing integers in smaller form besides saving memory also makes Set64 very fast. As with all of tinyset's data structures, Set64 will avoid heap allocation when possible. It is far more successful than previous attempts, because it is able to opportunistically store more elements when they are small (which is a common case for integral types).

I would love to have feedback, as the API is still flexible.

I suppose memory-efficient sets are a bit of a niche feature, but for me they are an important niche for rust to be able to scale well. HashMap is very good under the constraints it is placed: it must work for arbitrary types, for which Eq and Hash may both be slow, and should default to secure hashing to avoid “asking for” DOS attacks. But it is huge for small sets of small data: 312 bytes for a set containing a single u8, and 536 bytes for a set with a single u64 (or usize). This makes it problematic for use as an element within a larger data structure such as a graph.


#5

Today I’m announcing tinyset 0.2.0. The major change in this release is to include a couple new hashmap data types. Both have keys that have the Fits64 trait indicating that the keys can be converted to and from a u64. Map64 can hold any value type and will always allocate the array of values on the heap (and only put keys on the heap for large maps). Map6464 has Fits64 values, and will do no allocation for small amounts of data.