Efficient state machine inside a RefCell - possible?


#1

I have a state machine enum whose states have some fields in common, so in transitioning from one state to another I would like to consume the old state when producing the new one, and thus avoid cloning the fields unnecessarily.

However, the state is stored in a RefCell, so I can only get a &mut reference to it. This means I can’t move out the old fields to produce the new state.

I can’t figure out a way to do what I want with the current APIs. But if there were a function mem::transform_with like this, it would be possible:

use std::cell::RefCell;

mod mem {
    pub fn transform_with<T, F>(dest: &mut T, transformer: F)
    where
        F: FnOnce(T) -> T,
    {
        unimplemented!();
    }
}

enum State {
    Wibble(String),
    Wobble(String),
}

fn next(state: State) -> State {
    match state {
        State::Wibble(s) => State::Wobble(s),
        State::Wobble(s) => State::Wibble(s),
    }
}

pub fn main() {
    let cell = RefCell::new(State::Wibble("hello".to_owned()));
    let mut state = cell.borrow_mut();
    mem::transform_with(&mut *state, next);
}

Is this approach sound? Could such a method be added? Or, even better, can it be done with existing APIs?

(edit: transform_with was meant to return unit)


#2

If transformer panicked, dest would contain an uninitialized value. There’s a crate that implements your function by aborting on panic, but I can’t remember the name unfortunately.


#3

You probably already saw this, but RefCell has a (nightly-only) function:

fn replace_with<F>(&self, f: F) -> T 
where
    F: FnOnce(&mut T) -> T;

If whoever added this function explicitly decided against using an FnOnce(T) -> T, there’s probably a good reason for it, and that reason’s probably what @sfackler said (I’m just guessing, though.)


#4

Ah, I forgot about panics. I guess that’s a no-go for adding transform_with as a safe function in mem.

But I think an equivalent method could be added to RefCell though, as RefCell is single-threaded: if the transform function panics, the RefCell is going to be destroyed anyway.


#5

Threading has nothing to do with the safety issue here. The destructor for the T will run twice, which is not good if it’s, for example, freeing memory.


#6

Why would the destructor run twice? Is it not possible to set up the state before invoking the transformer such that it will only run once in the case of a panic?


#7

There is a T that is passed by value to the transform closure. When the closure panics and the thread unwinds, its destructor will run. There is a T in some other arbitrary location. A mutable reference to it was passed to replace_with. When the thread unwinds past the frame that owns the T, its destructor will run.

There’s no way for replace_with to tell whatever arbitrary location that was to not run a destructor.


#8

I agree in the case of the generic mem::transform_with, which appears to be a lost cause. But actually I was thinking of the RefCell::transform_with method that I proposed instead. I still think it could work, because in this case the RefCell logically owns the T; it’s not in another arbitrary location. It could pass its ownership to the transform closure, and then assume ownership of the result.


#9

It could be possible to implement in RefCell, but the point of that type is to allow for internal mutability. How is transform_with related to that?


#10

I’m not sure I understand your question, sorry. But to be clear, this is the signature of RefCell::transform_with that I was suggesting. Given that replace_with already exists, and it does pretty much the same thing but with different tradeoffs (i.e. replace_with lets you observe the original at the cost of having to construct a second value), it seems that transform_with could also make sense for RefCell?

use std::cell::RefCell;

trait RefCellExt<T> {
    fn transform_with<F: FnOnce(T) -> T>(&self, transformer: F);
}

impl<T> RefCellExt<T> for RefCell<T> {
    fn transform_with<F: FnOnce(T) -> T>(&self, transformer: F) {
        unimplemented!();
    }
}

enum State {
    Wibble(String),
    Wobble(String),
}

fn next(state: State) -> State {
    match state {
        State::Wibble(s) => State::Wobble(s),
        State::Wobble(s) => State::Wibble(s),
    }
}

pub fn main() {
    let cell = RefCell::new(State::Wibble("hello".to_owned()));
    cell.transform_with(next);
}

#11

replace_with doesn’t provide anything beyond what you could write yourself in safe code. It’s trivial convenience function:

    pub fn replace_with<F: FnOnce(&mut T) -> T>(&self, f: F) -> T {
        let mut_borrow = &mut *self.borrow_mut();
        let replacement = f(mut_borrow);
        mem::replace(mut_borrow, replacement)
    }

transform_with would have a very nontrivial implementation, and provide you with something you can’t get anywhere else.


#12

That’s why I want it. :wink:

I’m not sure yet if I want it enough to go write an RFC, but I think what I wanted here was confirmation that it would in theory be sound to implement such a method. After I failed to spot the issues with my more general function above, I just wanted to check my understanding. (Thanks for your help with that!)


#13

Note that something very similar ended with the RFC being closed, so be sure to address what was raised there if you do:

If you just want the function: https://docs.rs/take_mut/0.1.3/take_mut/fn.take.html


#14

Thanks, the discussion on that RFC was really useful. Interestingly more than one person posted with the state machine example.

It also presented some alternatives: (1) adding a new empty “poisoned” state, and using it with mem::replace, and (2) wrapping the state in Option and using Option::take. I had thought of (1) but not of (2), which I think I prefer.


#15

I really like the poison state idea (option 1) and I have been using it successfully. I am so impressed with this method that I even like the fact that Rust forced me to add this poison state (which I called State::Invalid).

What happened as my state machine developed, was that State::Invalid became a first class state that is handled in the state machine just like all other states. A reset event from State::Invalid is OK, as is an exit event. Now, after a terrible disaster caused by a coding error, I am just handling events and changing states as defined behaviour.


#16

If I understand you correctly, you suggest to add a “poison” mechanism directly into RefCell. This would require deeper changes to RefCell, beyond just adding a method. In fact, it would probably increase its size because the “poison” state has to be stored somewhere – so, even people not using the feature would pay for it. (Or maybe it could be folded into the read count?)

Moreover, you now have to handle this “poison” state everywhere. What happens if the RefCell is later borrowed again? (Can happen, e.g., if the panic is caught.) Also, notice that &mut RefCell<T> can be Send, so a panic in no way implies that no other thread can ever access the RefCell.


#17

Actually, I wasn’t being that sophisticated. I was just thinking that the single-threaded (i.e. Send but not Sync) nature of RefCell meant that there would be no one to observe the corrupted state. The thread that owns the RefCell is going to be destroyed, and (AIUI) no other threads could know about the RefCell. I gather that this is not true: can you give a specific sequence of events that would cause the corrupted state to be observed by another thread?


#18

You would still have to prevent the usually automatic drop call when the RefCell gets dropped; how does it know not to run?

I thought of something like this:

  • Thread A creates r: RefCell, then spawns a scoped thread and hands &mut r to the new thread B.
  • Thread B calls transform_with, which moves out the value from r, and then panics.
  • Thread A leaves the scope and gets access to r back, observing the corrupted state.

This assumes a scoped thread API that does not enforce forwarding panics from a subthread to the spawning thread. Turns out the few scoped thread APIs I found on crates.io all enforce forwarding panics. OTOH, I can’t think of a way in which not forwarding panics would be unsound (absent this example, where I’d put the blame on transform_with, not the thread API).

And anyway, you can catch panics with catch_unwind. AssertUnwindSafe is not unsafe to use, so this must not introduce UB (i.e., the UnwindSafe trait catches footguns, not UB). That way, I can certainly call transform_with, have the closure panic, catch the panic, and then observe the corrupted state.