I am using the excellent FST library by BurntSushi to implement custom search for my app and I do this in order to search for every word in the search term:
term_list.for_each(|term| {
let prefix_matcher = fst::automaton::Str::new(term).starts_with();
let union = fst::automaton::Levenshtein::new(term, 2)
.expect("build automaton")
.union(prefix_matcher);
let mut stream = self.fst.search(union).into_stream();
while let Some((_, v)) = stream.next() {
let (_, locs) = self.by_word_vec.get(v as usize).unwrap();
pre_filtered.extend(locs);
}
});
Now, the library provides a way to do unions between two search automatons (as demonstrated above and documented here), but I don't see how to do one full union between all the search terms (in both prefix and proximity search modes), so that I don't have to iterate over the words (which is a new search in the FST and makes it O(n) times slower when I have more words). I did experiment with hardcoding some functions that do unions between a fixed length list of words and it does seem to be faster than O(n).
My question is what would be the right solution here to create a "generic" automaton from a variable-length list of words (nothing crazy, can be capped at something like n<10)