Encapsulating the state machine in "Encoding States and Behavior as Types"

A lot of my code takes the form of state machines that take input "events", so I very closely read the section in the Rust book on OO state-based design.

The last part of this section, Encoding States and Behavior as Types, demonstrates state-based design using a different type for each state. But after reading it I'm left wondering how it could actually be used in practice.

In the example, state transitions take place within the scope of a main function:

fn main() {
    let mut post = Post::new();

    post.add_text("I ate a salad for lunch today");

    let post = post.request_review();

    let post = post.approve();

    assert_eq!("I ate a salad for lunch today", post.content());
}

As I understand it, the rebinding of post here is equivalent to having multiple variables of different types:

fn main() {
    let mut post_as_draft: DraftPost = Post::new();

    post_as_draft.add_text("I ate a salad for lunch today");

    let pending_post: PendingReviewPost = post_as_draft.request_review();
    // post_as_draft no longer valid

    let final_post: Post = pending_post.approve();
    // pending_post no longer valid

    assert_eq!("I ate a salad for lunch today", post.content());
}

...correct?

But let's say you then want to encapsulate this further. How would you do it? Would you need to have a struct containing all of these types (or an enum that could be any of them)? How would another part of the code actually interact with this, practically? Would you just end up reinventing the implementation at the start of the chapter anyway?

My confusion here can be summed up as: what is the point of this pattern if it can only be used within the scope of a single function, where the abstraction is somewhat wasted? In other situations, the answer to "why not just use an if statement instead of this abstraction" is "it could be more complicated than the example", but in this specific case, I can't see how it could be more complicated without throwing away the abstraction itself. I expect I'm missing something, but what?

If you really need this further encapsulated, e.g. you want to have a HashMap which contains posts in different states, then a good solution is indeed to use an enum:

enum Post {
    Draft(DraftPost),
    PendingReview(PendingReviewPost)
    Published(PublishedPost)
}

Having an instance of this enum does require you to have if (or perhaps match) clauses to do anything useful with the post. However, it still offers you the same protection as before: you can never do something invalid for that state, such as displaying the content of a draft, because only PublishedPost has a method for that.

Now you can implement a as_post method on each of the ..Post structs, which returns the correct variant. These methods are then trivial to review (one-liners), and you can still see from the signature of e.g. DraftPost::request_review that it always returns a PendingReviewPost. You just need to add that as_post at the end before you return it to your HashMap.

All that said, make sure that you actually need that level of abstraction. For instance, if you only want a Vec of posts, then perhaps it is better to have three different Vec-s, one for each kind of post, avoiding the runtime-checks.

1 Like

Thanks, this is helpful. I also found this: Pretty State Machine Patterns in Rust which builds on both ideas and incorporates From/To conversion. So I'm beginning to see how to generalise and encapsulate something like this.

In fact, a couple of the comments on this Reddit post are quite illuminating. It looks like this was more a way to demonstrate different ways to implement OOP principles with Rust, and less of a strong endorsement of that particular approach for state machine code.

Reading that post, I would very much recommend against using From in the way it suggests. From the docs of std::convert::From:

Used to do value-to-value conversions while consuming the input value.

I would not expect it to do business logic (of transitioning states), only trivial conversion logic. Also this leaves you with unclear into methods doing the transitions, instead of methods with clear names. And it doesn't really help you with any higher level abstraction, the only thing it enables is writing a method which accepts an impl Into<ThisState> - which makes sense when it's a conversion, not when it's business logic which then happens (almost) implicitly!

You could however replace my above-mentioned to_post methods with impl From<ThisState> for Post, then you can request impl Into<Post> in methods and leave the conversion to them. This doesn't help avoid the match statements though.

I'm inclined to agree with you; I greatly prefer a more functional approach to state machines and it seems to play to Rust's strengths more.

I will add that there are scenarios that you might not always think of as state machines where providing a compile time guarantee is worth the extra types. The two scenarios that come to mind are either cases where security matters or where a runtime check would problematic. Our possibly just a scenario where the code flow is potentially confusing enough.

A security scenario might be a builder type which requires that some field (e.g. nonce or key) always be properly initialized. Using this approach you could ensure that one of a set of methods are called. You could of course achieve this at runtime with a flag and a check, but that might be more fragile, and of course incurs a runtime cost.

A code flow confusion scenario might be something like a builder that requires that certain changes be made before other ones (similar to the above), but you might be worried neither about security nor about runtime cost, but just want to create an API that helps your users navigate the constraints of using your code.

All my examples include builders because that's the scenario that comes to mind where you've frequently got multiple paths through your code (or you wouldn't be using the builder pattern), which might also have practical constraints.

3 Likes