Using trait vs. wrapping object in enum

As a project to get familiar with Rust I wrote a regular expression search. (for which I got a lot of help on the code review forum; thanks!) It is now finished and working, but I am rethinking some of the design. It uses structs representing different types of nodes, wrapping them in enums rather than Boxes. I decided to refactor, replacing the Node enum with a trait and dealing with the structs directly rather than encased in the enum.

It is not really practical to use the trait for all access, that would require lots of methods which would do nothing on most types, so I want to recover the struct from the trait. I found this can be done with the Any trait, but as I was looking into it I saw How to match trait implementors on Stack Overflow, where the accepted answer says

You have two options here. If you know the set of structures implementing your trait in advance, just use enum, it's a perfect tool for this. They allow you to statically match on a closed set of variants:...This is by far the simplest way and it should always be preferred.

Is this the accepted view? From the beginning I thought traits seemed the natural way to do this but decided since my inexperience made it tougher I'd use enums and then see if it could be replaced with traits. I think I see my way through now to doing that, but it requires accessing the full structure from the trait. It can be done, but my question is should it, or is the code better left as it is?

Or is adding lots of methods to the trait, supplying default versions doing nothing, and overriding them only for the structs that use them?

Another thought I had was using unions - in that case accessing the objects is simple (as long as I know the type), but I get the feeling unions are supplied mainly as an aid to interacting with other languages, and are discouraged in regular Rust coding. Certainly they add back some of the danger that Rust is designed to remove from programming.

So that is 4 possible designs:

  • enums wrapping structs (leave it as it is
  • traits, somehow recovering the original struct from the trait
  • traits, adding lots of methods that only exist on one struct and do nothing on the others
  • unions

Is there a strong preference for any of these designs? Any that are clearly recommended against?

1 Like

If you're going to be downcasting a lot, type erasure (dyn Trait) was probably the wrong choice -- or at least, the wrong choice at the level where you're downcasting. It might still make sense to have an enum whose payloads all implement a given trait, in some circumstances.

If you're only ever going to support a fixed set of concrete types,[1] the best a trait will generally get you is some amount of reduction of code writing. In that case, you're basically using traits as some sort of macro system. That can be fine -- I do it sometimes -- but if it results in more code and not less, it's probably not helping you.

You can also go that route if it makes sense without using dyn Trait.

Another thing to consider for this particular crate (where you're making a lot of recursive calls but still want things tobe fast) is that if you go with dyn Trait, you'll probably take a speed hit unless the optimizer can devirtualize and inline your calls.


An alternative to downcasting is something like

trait Trait: AsRef<CommonSubstructure>

so that you can always access some underlying struct.

(But with a fixed set of types, you can just use composition without enforcing it with a trait, too.)

That definitely sounds like a poor fit. I only skimmed your code, but if there's a good candidate for traits it's going to something like the "everything but OrNode and None do this" parts, and not the "every type of node does something different" parts.

enums are tagged unions, i.e. unions that track which variant they are. Using untagged unions wouldn't gain you anything, would require unsafe, and if you get your manual tracking of which variant a value is wrong, you're probably UB.

Using unions here would be like switching to raw pointers, even though you don't have any borrow errors.


  1. in contrast with something consumers of your library might implement, say ↩ī¸Ž

2 Likes

Off topic but I suggest you get rid of static mut and don't allocate to indent.

Using unions is exactly the same as using enums but unsafe. If enums don't help with your architecture, then unions won't help either, and you will also have safety problems – a net loss.

I agree both are undesirable, but eliminating them isn't obvious (to me). pad() originally used a statically allocated string of blanks, but I don't like limiting the size of padding available, on the assumption that nesting can go pretty deep. I think I can design the strings out, but possibly at the cost of not using the Debug trait, a recommendation someone gave earlier. This would also eliminate string allocation - though in this program I don't think that will slow it down significantly.

For the static mut trace, is there any other way to save state between these calls, besides maybe having a state struct passed to every function call? "unsafe' makes me properly nervous - in this case I know it is perfectly safe since there is just a single synchronous thread, but I'd like to get rid of it. In this case I really don't see another way to accomplish the nested output, and that makes it so much easier to read and understand - before I put it in I found myself adding indentation by hand in a text editor to understand what was going on. Short of passing a state struct in every call, is there a nice way to do this>

The link I posted has no unsafe, doesn't allocate, doesn't have a limit you'll reasonably hit, and doesn't require passing the state around.

2 Likes

How embarrassing, I saw that reply in a browser preview which didn't display the link. I guess I'm still thinking in other languages, where locking is not needed for something like this. Thanks - that is another feature of the languages I'll get to try out, which helps the goal of this project. Printing the spaces rather than generating a String does mean not using the Debug trait to display it, which is fine with me - it is how it worked before it was changed at someone else's suggestion. Yours is better, though I don't think here the allocation takes a lot of time (and it only happens in debugging mode anyway) it is bad practice.

No, that's incorrect. Locking (or more precisely, synchronization, since atomics aren't locks) is absolutely needed in other languages, too – race conditions are considered UB or at least another kind of error in most high-level languages.

It's just that most languages aren't smart enough to catch this error at compile time, so they allow you to do the wrong thing that blows up at runtime. But accessing globals almost always requires some sort of synchronization.

Some languages like Python and JS requires global synchronization on every execution point to eliminate the need for manual synchronizations. Though logical locks are still make sense for these languages to handle concurrency well, most notably in cases like async code.

Yes, I still count those languages as "requiring synchronization". Because they do require synchronization, but it's automatic/implicit.

One less drastic refactoring I would recommend exploring is to pull the None variant out into things being Option<Node> Option<Path<'a>> etc, since None seems like rather a lot of the special cases.

enum_delegate — Rust library // Lib.rs can cover a lot of the trait call boilerplate for the rest.

Though in this instance, I'd be tempted to factor out the structs some, e.g.

/// Represents a single step for a node
pub struct Step<'a,T,C=()> {
    /// the String where this match starts
    string: &'a str,
    /// The length of the string in bytes. Important: this is not in chars. Since it is in bytes the actual matching string is string[0..__match_len__]
    match_len: usize,
    /// The node from phase 1
    node: &'a T,
    /// Information on children, if any
    children: C,
}

/// Represents children of an AndNode (0 or more nodes that all must match)
pub struct AllChildren<'a> (
    /// A vector of Paths saving the current state of this And node. Each entry is a **Path** based on the **nodes** member of the **AndNode** structure.
    /// When the Paths vector is filled this step for the And node has succeeded.
    Vec<Path<'a>>,
)

/// Represents children of an OrNode (0 or more nodes that one must match)
pub struct AnyChild<'a> {
    /// The OR node needs only a single branch to succeed. This holds the successful path
    path: Box<Option<Path<'a>>>,
    /// This points to the entry in **node.nodes** which is currently in the **child_path** element
    which: usize,
}

//////////////////////////////////////////////////////////////////

pub enum Path<'a> {
    Chars(Vec<Step<'a,CharsNode>>), Special(Vec<Step<'a,SpecialCharNode>>), Set(Vec<Step<'a,SetNode>>),
    And(Vec<Step<'a,AndNode,AllChildren<'a>>>), Or(Vec<Step<'a,OrNode,AnyChild<'a>>>)
}
type OPath<'a> = Option<Path<'a>>

which perhaps bears resemblance to how you might use unions.

Other pieces of the puzzle, which hooooopefully should optimize out fine:


#[derive(VariantFrom)]
pub enum NodeRef<'a> {Chars(&'a CharsNode), SpecialChar(&'a SpecialCharNode), And(&'a AndNode), Or(&'a OrNode), Set(&'a SetNode), }

impl<'a,T,C> Step<'a,T,C> {
  fn erase(&self) -> Step<'a,NodeRef,()> {
    Step { string: self.string, match_len: self.match_len,
           node: self.node.into_variant(), children: ()}
  }
}

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.