I made a new string interning crate:
It includes 3 features that I've needed in a couple situtations:
- It (optionally) includes a built-in global symbol table, so you can just say
GlobalSymbol::from("foobar") to intern things into the global table.
- It's well-suited to concurrent access; a symbol table is sharded to reduce lock contention. In my limited benchmarking, it's the fastest string interner under medium/high contention, and reasonably fast in no/low contention.
- It hands out stable
&'a strs. So the built-in global table can give out
&'static strs. This implementation is based in-part on matklad's blog post.
Maybe it's useful for you too!
How does this compare to ustr?
I've tried my hand at string interning and found it hard to beat
Ah, I wasn’t aware of ustr, looks great! I’ll include it in my benchmarks next time.
Some quick differences:
- My symbols are u32s, ustr’s are pointers.
- I use hashbrown; that might be faster in some cases.
Note that std's hashmap is also hashbrown, so unless you're using something that the std container doesn't support, it shouldn't make a difference.
Yes, but at first glance, I believe they are rolling their own hashtable somewhere in there. Also, I use raw_entry, which isn’t available in std.
u32 symbol ids is nice, but cached hashes is nicer . Esp if it lets you use the Ustr in map maps. They even have a
UstrSet. But this is more dependent on the use-case.
But just comparing the raw interning step,
ustr works really well even under contention. Looking forward to your updated benchmarks.
ustrs performance is definitely best-in-class! It's 1.5-2.0x faster at interning!
u32 too good to give up for my domain, since creating strings is ultimately not a bottleneck, but working with them might be. It would be really interesting to see if you could get best-of-both worlds!