In this weekend I built a small prototype for short strings optimization.
The idea is to embed a custom string type into the container (say, a hash table), and let the container to decide the threshold for having the string data stored inline in the container, or use a raw pointer for indirection. The major use case is to optimize hash tables which uses many short strings as keys.
The interface to use the embedded string (and a specialized hash table) looks like this:
// A new string type called embedded string on stack. Not very useful to use it solely. // `u8` can be replace by `_`. let estring: EString<[u8; 24]> = EString::new(); // Default EString size. We can use this type alias so that we don't have to specify the // generic type argument <[_; 16]> everytime. type EStringDefault = EString<[_; 16]>; // If we use weird embedded size like 0, the type is smart enough to extend the size to // hold the pointer. In my current implementation, the minimal size is 16 (size_of(*mut str)). // I may improve it to use *mut u8, where size is 8 instead of 16. let safe_estring<[_; 0]> = EString::new(); // Using EString with a hash table let my_table = EHashMap::<EStringDefault, EStringDefault>::new(); // Using EString with a hash table with custom embedded size. // In this case, we assume most of our values fits in 250 chars. let my_table2 = EHashMap::<EStringDefault, EString<[_; 250]>>::new();
All the above can be implemented in stable rust.
I know there was a previous discussion about SSO here:
Yet the embedding type into another type wasn’t the focus.
This is why I start a new thread to discuss about it.
To release the code, I need to get approval from my working company (google).
Feel free to leave your comment for this idea if it interests you.