The vast majority of the complexity in the
fst crate is in reading and writing its serialization format. There are a number of compression related tricks that take advantage of the structure of the FST and the way in which it is encoded. For example, states with exactly one transition that are encoded right next to the state pointed to by that transition are exceptionally common. As a result, they are encoded with a single byte. The “next” state identifier isn’t encoded at all; instead, it’s inferred based on the location of the state in the FST’s serialized form. There are many such optimizations in the serialization format, and they seem incompatible with building a persistent data structure, insomuch as I can see anyway.
Are FSTs themselves fundamentally incompatible with a persistent representation? I’m not sure. The way to implement such a data structure isn’t immediately obvious to me. So I think if someone really needs this, you might start by creating a new crate (perhaps by forking
fst) and seeing if it can be done.
You might have better luck than me at this, but you might try looking into Elasticsearch/Lucene. They also use FSTs internally and support document updates. My understanding of how it works is that the document is deleted and re-indexed completely. New FSTs are thus generated. Querying then requires visiting all of the FSTs. When the number of FSTs gets too large, you start merging some of them.