Hey everyone, so I'm working on a library that processes text and extracts keywords efficiently using a trie.
The trie is defined as:
struct Node {
clean_word: Option<String>, // if clean_word is `Some` it means that we've reached the end of a keyword
children: fxhash::FxHashMap<String, Node>,
}
Then there is the struct:
pub struct KeywordProcessor {
trie: Node,
len: usize, // the number of keywords the struct contains (not the number of nodes)
}
That stores the trie and implements the method that iterates over the text and extracts the keywords.
The way it works is that it splits the string into tokens/words and it explores the trie by checking if the token exists inside the HashMap.
Now what I want to do is to support case-insensitive matching, which can achieved simply by editing this line:
struct Node {
- children: fxhash::FxHashMap<String, Node>,
+ children: case_insensitive_hashmap::CaseInsensitiveHashMap<Node>,
...
}
I tested it, and the code compiles.
But I want to make it configurable so the user can change between them, either dynamically; for example by adding the parameter case_sensitive: bool
to KeywordProcessor::new()
, or at compile time by making it part of the type like with a type parameter or using an Enum for the hashamp itself which has two variants.
I tried a bunch of solutions using Enums and Unions but I couldn't get any of them to work, any help will be appreciated.
This is the source code: GitHub - shner-elmo/flashtext2-rs: Flashtext implementation in Rust