How do I make my struct accept a Union of two types that have the same methods?

Hey everyone, so I'm working on a library that processes text and extracts keywords efficiently using a trie.

The trie is defined as:

struct Node {
    clean_word: Option<String>, // if clean_word is `Some` it means that we've reached the end of a keyword
    children: fxhash::FxHashMap<String, Node>,
}

Then there is the struct:

pub struct KeywordProcessor {
    trie: Node,
    len: usize, // the number of keywords the struct contains (not the number of nodes)
}

That stores the trie and implements the method that iterates over the text and extracts the keywords.
The way it works is that it splits the string into tokens/words and it explores the trie by checking if the token exists inside the HashMap.

Now what I want to do is to support case-insensitive matching, which can achieved simply by editing this line:

struct Node {
-    children: fxhash::FxHashMap<String, Node>,
+    children: case_insensitive_hashmap::CaseInsensitiveHashMap<Node>,
    ...
}

I tested it, and the code compiles.

But I want to make it configurable so the user can change between them, either dynamically; for example by adding the parameter case_sensitive: bool to KeywordProcessor::new(), or at compile time by making it part of the type like with a type parameter or using an Enum for the hashamp itself which has two variants.

I tried a bunch of solutions using Enums and Unions but I couldn't get any of them to work, any help will be appreciated.

This is the source code: GitHub - shner-elmo/flashtext2-rs: Flashtext implementation in Rust

Enums in Rust represent the sum type. You have to define Node as an enum because you must have a unique type:

In general the two ways in Rust are:

  1. Define a trait for all the functions you need on the hashmap, and then implement the trait for the two concrete hashmaps you're using. You'll end up using some sort of dynamic dispatching at some layer in your code for this.
  2. Define an enum with a variant for each of the two implementations, and have the implementation do a match to select the variant to use. This can also benefit from defining a common trait that is implemented by the two variants (see enum_dispatch), but that isn't strictly necessary.

Since you say you tried using an enum and it didn't work, can you post what you tried and what problems you encountered? The problems could be solvable, but we need to know the details.

2 Likes

In particular, it seems unlikely to me that if the problem cannot be modeled with enums, that it could be modeled using traits.

While traits allow for open set membership relative to enums, in general enums have far better ergonomics, making them easier to work with.

1 Like

Honestly I think it would be easier to define all the code inside a macro that then generates the methods for KeywordProcessor and KeywordProcessorInsensitive

This is not true as a general assertion:

  • Enums are incapable of representing type-level computation, which is trivial using traits
  • Enums incur a conditional match upon every use site, whereas generics continue to allow you to treat the value as a single entity.

And even specifically considering OP's requirements, "different concrete types with the same set of methods/behavior" is a telltale sign of generics. OP doesn't want "either an X hash map or a Y hash map", s/he wants an "any map that I can use to look up child nodes".

2 Likes

I surprised you didn't pick up on the context that I was talking about enums vs traits in use cases that they both cater to.

With traits that isn't free: what you gain in runtime performance, you lose in compile time.
And while it's true that you need to match on an enum, that's part of what makes them ergonomic.

Traits combined with generics are what I use when performance matters more than readability (I wouldn't call expressing computations with type level generics with traits readable BTW. Merely expressible, as the expressed logic gets obscured quite quickly for nontrivial type-level computations).

Enums are what I reach for when I want the logic to be expressed plainly as day.

It is not. Generics are one way to express it, yes. But as I said earlier, it's perfectly valid to do so using enums. I've done it in the past and stand by that modeling decision. It depends on what OP needs other than wrapping the type. As stated though, both enums and traits could work.

Another solution you see occasionally is reusing the same code file for multiple modules. In this case, it would look something like this:

trie_shared.rs:

struct Node {
    clean_word: Option<String>,
    children: super::HashMap<Node>,
}

// Everything else you’ve already written…

trie.rs:

pub mod case_sensitive {
    type HashMap<Node> = fxhash::FxHashMap<String, Node>;
    #[path = "path/to/trie_shared.rs"] mod shared;
    pub use shared::{Node, KeywordProcessor};
}

pub mod case_insensitive {
    type HashMap<Node> = case_insensitive_hashmap::CaseInsensitiveHashMap<Node>;
    #[path = "path/to/trie_shared.rs"] mod shared;
    pub use shared::{Node, KeywordProcessor};
}
3 Likes

That's a moving target. We were talking about usability, not compile times.

It's valid, but in the long term, it always ends up being generics anyway.

You want to choose between 2 concrete types today. Tomorrow, you discover 10 more use cases and their corresponding specific types. And the day after, you turn the code into a library and it occurs to you that you should really allow users to plug in whatever kind of map they want to use, not just one of the 10 you know about.

Enums are great for modelling primitive-like types where it is not so much the abstract properties but the concrete values that matter. serde_json::Value is an enum, because it's only ever going to be one of null, bool, int, float, string, array, or map. And you pretty much want to know which of these concrete types it is in order to make any sense of the data.

But "I want to insert a value for a key" is much more of an abstract property and general behavior than an inquiry into the identity of the map type. You don't care about what the map is. You care about what it does. And that's the definition of a trait, right there.

You moved the target by misrepresenting the costs of enums vs traits. So I evened it out by mentioning that traits aren't as free as suggested.

And that's a false assertion, since those uses I mentioned using enums rather than traits are still there, and going nowhere. There's no reason for them to be refactored.

That's not all that relevant to the discussion, is it. Nor is the rest of that paragraph about modeling. Especially since the answer is "then you refactor".

Now you're saying something that I can actually work with.
This is the whole crux of enums.

At least for the software I'm writing professionally (that's a parser, compiler and a VM) that's just so, so wrong.

For a webserver I wrote last year, it's also empirically wrong, as it affected the performance of its job in nontrivial ways. So it's not even a property of the domain of the software I' mostly write, but more general than that.

Ultimately it's up to the user what to do. And you coming from On High claiming that traits are "The One True Way" just is not helpful, accurate, or even true.

Wait what? I didn't know this was possible in Rust, I still I'm not sure exactly it's supposed to work, can you perhaps show an example?

Btw guys, the reason why I do not like Enum in this case is because Rust forces me to handle the concrete type every time I use it, but in my case the code to handle both is exactly the same, so it's very annoying.

I would have to copy and paste large chunks of code twice.

Also because if I use Enums, it won't be inside the Node, because it doesn't make sense to repeat the same Enum variant recursively within the trie.
So at that point I will have to make the Enum at the trie level. (To save a bit of memory and also performance)
Which will make me duplicate even more code

2 Likes

I wasn't misrepresenting anything. I was pointing out that your blanket statement was a gross oversimplification.

There's nothing in my post about "cost", either. Only you brought that up. I am still purely discussing the usability aspect.

It essentially instructs the compiler to copy-paste the contents of the same file as two distinct modules.

2 Likes

I'm trying to implement this solution, but I'm not able to resolve the HashMap namespace within the trie_shared.rs file.

Can you show how these two files integrate with lib.rs?

I created a new branch with the changes @2e71828 suggested, but I'm getting the following error:

   Compiling flashtext2 v0.1.0 (/home/shneor/Desktop/projects/rust/flashtext2-rs)
error[E0412]: cannot find type `HashMap` in module `super`
 --> src/shared.rs:8:22
  |
8 |     children: super::HashMap<Node>,
  |                      ^^^^^^^ not found in `super`
  |
help: consider importing this struct
  |
1 + use std::collections::HashMap;
  |
help: if you import `HashMap`, refer to it directly
  |
8 -     children: super::HashMap<Node>,
8 +     children: HashMap<Node>,
  |

warning: unused imports: `KeywordProcessor`, `Node`
 --> src/trie.rs:5:22
  |
5 |     pub use shared::{Node, KeywordProcessor};
  |                      ^^^^  ^^^^^^^^^^^^^^^^
  |
  = note: `#[warn(unused_imports)]` on by default

warning: unused imports: `KeywordProcessor`, `Node`
  --> src/trie.rs:11:22
   |
11 |     pub use shared::{Node, KeywordProcessor};
   |                      ^^^^  ^^^^^^^^^^^^^^^^

error[E0308]: mismatched types
  --> /home/shneor/Desktop/projects/rust/flashtext2-rs/src/shared.rs:16:47
   |
16 |             Node {clean_word: None, children: CaseInsensitiveHashMap::new()}
   |                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `HashMap<String, Node, BuildHasherDefault<...>>`, found `CaseInsensitiveHashMap<_>`
   |
   = note: expected struct `std::collections::HashMap<String, case_sensitive::shared::Node, BuildHasherDefault<FxHasher>>`
              found struct `CaseInsensitiveHashMap<_>`

error[E0308]: mismatched types
  --> /home/shneor/Desktop/projects/rust/flashtext2-rs/src/shared.rs:14:47
   |
14 |             Node {clean_word: None, children: FxHashMap::default() }
   |                                               ^^^^^^^^^^^^^^^^^^^^ expected `CaseInsensitiveHashMap<Node>`, found `HashMap<_, _, BuildHasherDefault<FxHasher>>`
   |
   = note: expected struct `CaseInsensitiveHashMap<case_insensitive::shared::Node>`
              found struct `std::collections::HashMap<_, _, BuildHasherDefault<FxHasher>>`

Some errors have detailed explanations: E0308, E0412.
For more information about an error, try `rustc --explain E0308`.
warning: `flashtext2` (lib) generated 2 warnings
error: could not compile `flashtext2` (lib) due to 3 previous errors; 2 warnings emitted

Any help/suggestions aprcciated

It looks like there's some tricky issues here with the way I originally suggested. Following the workaround from that issue means that something like this should work for you:

src/lib.rs:

// NB: No `mod shared` declaration at this level!

#[path = "."]
pub mod case_sensitive {
    type HashMap<Node> = fxhash::FxHashMap<String, Node>;
    mod shared;
    pub use shared::{Node, KeywordProcessor};
}

#[path = "."]
pub mod case_insensitive {
    type HashMap<Node> = case_insensitive_hashmap::CaseInsensitiveHashMap<Node>;
    mod shared;
    pub use shared::{Node, KeywordProcessor};
}

src/shared.rs:

struct Node {
    clean_word: Option<String>,
    children: super::HashMap<Node>,
}

// Everything else you’ve already written…
2 Likes

The problem here is the mod shared line in lib.rs; you should remove it.

This creates an unnecessary third shared module at the top level. In this copy, super::HashMap refers to a HashMap type defined in lib.rs directly— This doesn't exist, which causes the error you're seeing.

2 Likes

Thank you so much!
I applied the changes as you said in the last commit, and now everything works.

I'm really glad I found this solution because this is the only way to avoid all the Enum and Trait boilerplate...

If you have any articles you can link that explain how this works I'd definitely read them, thanks again

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.