Container embedded string for short strings optimization


#1

Hi Rustaceans,

In this weekend I built a small prototype for short strings optimization.
The idea is to embed a custom string type into the container (say, a hash table), and let the container to decide the threshold for having the string data stored inline in the container, or use a raw pointer for indirection. The major use case is to optimize hash tables which uses many short strings as keys.

The interface to use the embedded string (and a specialized hash table) looks like this:

// A new string type called embedded string on stack. Not very useful to use it solely.
// `u8` can be replace by `_`.
let estring: EString<[u8; 24]> = EString::new();

// Default EString size. We can use this type alias so that we don't have to specify the
// generic type argument <[_; 16]> everytime.
type EStringDefault = EString<[_; 16]>;

// If we use weird embedded size like 0, the type is smart enough to extend the size to
// hold the pointer. In my current implementation, the minimal size is 16 (size_of(*mut str)).
// I may improve it to use *mut u8, where size is 8 instead of 16.
let safe_estring<[_; 0]> = EString::new();

// Using EString with a hash table
let my_table = EHashMap::<EStringDefault, EStringDefault>::new();

// Using EString with a hash table with custom embedded size.
// In this case, we assume most of our values fits in 250 chars.
let my_table2 = EHashMap::<EStringDefault, EString<[_; 250]>>::new();

All the above can be implemented in stable rust.

I know there was a previous discussion about SSO here:


Yet the embedding type into another type wasn’t the focus.
This is why I start a new thread to discuss about it.

To release the code, I need to get approval from my working company (google).
Feel free to leave your comment for this idea if it interests you.


#2

You can compare notes against smallvec.


#3

Thanks for sharing!