Is there a production ready Trie crate?

#1

I’ve been searching for a crate with a generic trie implementation.

I have so far come across sequence-trie and radix-trie both from the same author. Radix-trie seems to be the more mature of the two but has a disclaimer regarding its production readiness.

Are there alternative crate(s) providing concrete trie implementations, preferably with an api close to the stdlib maps collection.

#2

Not a trie, but in case you are interested in text-processing specifically, and you know all the data upfront, you might want to use the fst crate: it builds a minimal automaton (compressed trie in a sense) from a sequence of byte strings.

#3

Thanks @matklad. the fst crate looks interesting. I went over the map builder and saw the values are not generic and also it has a constraint on keys being added in lexicographic order.
Does it mean if my values are strings I need to have some other map structure to store them?
I could have the keys sorted but I’m not sure what’s the approach when the values are not integers.

#4

Yes. fsts are only suitable for highly constrained use cases. If you have one of those use cases, then fsts are a boon. If not, then you need to look for something else or otherwise contort your implementation to make use of fsts.

Technically, fsts in theory can support string values directly, but the fst crate does not support it and there aren’t any plans to add it at this point.

Other than that, I think you’ll probably want to spend some time with the various trie crates listed on crates.io and do your own research about which one is best. Or otherwise, write your own.

#5

To give a concrete example, you can store values in a separate Vec<T> and store indices in the fst::Map itself. This is what I do in Rust analyzer:

fsts are only suitable for highly constrained use cases.

I am not sure I fully agree here :slight_smile: I use fst in RLS and rust-analyzer to store the index of all Rust symbols. It’s not really a constrained use-case: the amount of symbols is not too big, and I don’t use the “save the binary blob to disk” feature. But the fst APIs are pretty convenient for this use-case anyway.

1 Like
#6

To elaborate a bit more on what I had in mind, here are some constraints for folks that might not have used the fst crate before:

  1. Immutability. Once built, it can never change.
  2. Keys must be inserted in sorted order.
  3. Keys must sort naturally lexicographically. (Strings fit this nicely, but, for example, little endian integers do not! You can almost always find a path here, e.g., use big endian integers, but you have to come up with the encoding.)
  4. Willing to pay for slightly more expensive lookups (because the fst needs to be traversed and is stored in a compressed state).
  5. Willing to deal with the fact that keys must be &[u8] and values must be u64 (or otherwise absent).

Given that fsts are themselves just sets or maps, there are often a lot of choices in terms of which implementation to choose. The above specs out enough constraints to make fsts fairly niche, IMO. But yeah, fsts do have some nice consumer side APIs that go beyond normal sets/maps that can make them quite attractive.