Efficient state machine inside a RefCell - possible?

bchallenor · January 21, 2018, 1:15am

I have a state machine enum whose states have some fields in common, so in transitioning from one state to another I would like to consume the old state when producing the new one, and thus avoid cloning the fields unnecessarily.

However, the state is stored in a RefCell, so I can only get a &mut reference to it. This means I can't move out the old fields to produce the new state.

I can't figure out a way to do what I want with the current APIs. But if there were a function mem::transform_with like this, it would be possible:

use std::cell::RefCell;

mod mem {
    pub fn transform_with<T, F>(dest: &mut T, transformer: F)
    where
        F: FnOnce(T) -> T,
    {
        unimplemented!();
    }
}

enum State {
    Wibble(String),
    Wobble(String),
}

fn next(state: State) -> State {
    match state {
        State::Wibble(s) => State::Wobble(s),
        State::Wobble(s) => State::Wibble(s),
    }
}

pub fn main() {
    let cell = RefCell::new(State::Wibble("hello".to_owned()));
    let mut state = cell.borrow_mut();
    mem::transform_with(&mut *state, next);
}

Is this approach sound? Could such a method be added? Or, even better, can it be done with existing APIs?

(edit: transform_with was meant to return unit)

sfackler · January 21, 2018, 1:20am

If transformer panicked, dest would contain an uninitialized value. There's a crate that implements your function by aborting on panic, but I can't remember the name unfortunately.

quadrupleslap · January 21, 2018, 1:25am

You probably already saw this, but RefCell has a (nightly-only) function:

fn replace_with<F>(&self, f: F) -> T 
where
    F: FnOnce(&mut T) -> T;

If whoever added this function explicitly decided against using an FnOnce(T) -> T, there's probably a good reason for it, and that reason's probably what @sfackler said (I'm just guessing, though.)

bchallenor · January 21, 2018, 2:04am

Ah, I forgot about panics. I guess that's a no-go for adding transform_with as a safe function in mem.

But I think an equivalent method could be added to RefCell though, as RefCell is single-threaded: if the transform function panics, the RefCell is going to be destroyed anyway.

sfackler · January 21, 2018, 2:31am

Threading has nothing to do with the safety issue here. The destructor for the T will run twice, which is not good if it's, for example, freeing memory.

bchallenor · January 21, 2018, 2:42am

Why would the destructor run twice? Is it not possible to set up the state before invoking the transformer such that it will only run once in the case of a panic?

sfackler · January 21, 2018, 3:00am

There is a T that is passed by value to the transform closure. When the closure panics and the thread unwinds, its destructor will run. There is a T in some other arbitrary location. A mutable reference to it was passed to replace_with. When the thread unwinds past the frame that owns the T, its destructor will run.

There's no way for replace_with to tell whatever arbitrary location that was to not run a destructor.

bchallenor · January 21, 2018, 3:26am

I agree in the case of the generic mem::transform_with, which appears to be a lost cause. But actually I was thinking of the RefCell::transform_with method that I proposed instead. I still think it could work, because in this case the RefCell logically owns the T; it's not in another arbitrary location. It could pass its ownership to the transform closure, and then assume ownership of the result.

sfackler · January 21, 2018, 4:42am

It could be possible to implement in RefCell, but the point of that type is to allow for internal mutability. How is transform_with related to that?

bchallenor · January 21, 2018, 5:18am

I'm not sure I understand your question, sorry. But to be clear, this is the signature of RefCell::transform_with that I was suggesting. Given that replace_with already exists, and it does pretty much the same thing but with different tradeoffs (i.e. replace_with lets you observe the original at the cost of having to construct a second value), it seems that transform_with could also make sense for RefCell?

use std::cell::RefCell;

trait RefCellExt<T> {
    fn transform_with<F: FnOnce(T) -> T>(&self, transformer: F);
}

impl<T> RefCellExt<T> for RefCell<T> {
    fn transform_with<F: FnOnce(T) -> T>(&self, transformer: F) {
        unimplemented!();
    }
}

enum State {
    Wibble(String),
    Wobble(String),
}

fn next(state: State) -> State {
    match state {
        State::Wibble(s) => State::Wobble(s),
        State::Wobble(s) => State::Wibble(s),
    }
}

pub fn main() {
    let cell = RefCell::new(State::Wibble("hello".to_owned()));
    cell.transform_with(next);
}

sfackler · January 21, 2018, 5:23am

replace_with doesn't provide anything beyond what you could write yourself in safe code. It's trivial convenience function:

    pub fn replace_with<F: FnOnce(&mut T) -> T>(&self, f: F) -> T {
        let mut_borrow = &mut *self.borrow_mut();
        let replacement = f(mut_borrow);
        mem::replace(mut_borrow, replacement)
    }

transform_with would have a very nontrivial implementation, and provide you with something you can't get anywhere else.

bchallenor · January 21, 2018, 5:41am

That's why I want it.

I'm not sure yet if I want it enough to go write an RFC, but I think what I wanted here was confirmation that it would in theory be sound to implement such a method. After I failed to spot the issues with my more general function above, I just wanted to check my understanding. (Thanks for your help with that!)

scottmcm · January 21, 2018, 7:28am

Note that something very similar ended with the RFC being closed, so be sure to address what was raised there if you do:

If you just want the function: take_mut::take - Rust

bchallenor · January 21, 2018, 9:20am

Thanks, the discussion on that RFC was really useful. Interestingly more than one person posted with the state machine example.

It also presented some alternatives: (1) adding a new empty "poisoned" state, and using it with mem::replace, and (2) wrapping the state in Option and using Option::take. I had thought of (1) but not of (2), which I think I prefer.

newbie101 · January 21, 2018, 9:40am

I really like the poison state idea (option 1) and I have been using it successfully. I am so impressed with this method that I even like the fact that Rust forced me to add this poison state (which I called State::Invalid).

What happened as my state machine developed, was that State::Invalid became a first class state that is handled in the state machine just like all other states. A reset event from State::Invalid is OK, as is an exit event. Now, after a terrible disaster caused by a coding error, I am just handling events and changing states as defined behaviour.

RalfJung · January 21, 2018, 11:53am

If I understand you correctly, you suggest to add a "poison" mechanism directly into RefCell. This would require deeper changes to RefCell, beyond just adding a method. In fact, it would probably increase its size because the "poison" state has to be stored somewhere -- so, even people not using the feature would pay for it. (Or maybe it could be folded into the read count?)

Moreover, you now have to handle this "poison" state everywhere. What happens if the RefCell is later borrowed again? (Can happen, e.g., if the panic is caught.) Also, notice that &mut RefCell<T> can be Send, so a panic in no way implies that no other thread can ever access the RefCell.

bchallenor · January 21, 2018, 12:38pm

Actually, I wasn't being that sophisticated. I was just thinking that the single-threaded (i.e. Send but not Sync) nature of RefCell meant that there would be no one to observe the corrupted state. The thread that owns the RefCell is going to be destroyed, and (AIUI) no other threads could know about the RefCell. I gather that this is not true: can you give a specific sequence of events that would cause the corrupted state to be observed by another thread?

RalfJung · January 21, 2018, 1:12pm

You would still have to prevent the usually automatic drop call when the RefCell gets dropped; how does it know not to run?

I thought of something like this:

Thread A creates r: RefCell, then spawns a scoped thread and hands &mut r to the new thread B.
Thread B calls transform_with, which moves out the value from r, and then panics.
Thread A leaves the scope and gets access to r back, observing the corrupted state.

This assumes a scoped thread API that does not enforce forwarding panics from a subthread to the spawning thread. Turns out the few scoped thread APIs I found on crates.io all enforce forwarding panics. OTOH, I can't think of a way in which not forwarding panics would be unsound (absent this example, where I'd put the blame on transform_with, not the thread API).

And anyway, you can catch panics with catch_unwind. AssertUnwindSafe is not unsafe to use, so this must not introduce UB (i.e., the UnwindSafe trait catches footguns, not UB). That way, I can certainly call transform_with, have the closure panic, catch the panic, and then observe the corrupted state.

Topic		Replies	Views
How can I take and replace the value of a RefCell? help	13	1776	August 6, 2022
Question about RefCell::borrow_mut help	3	375	February 19, 2022
Mutate Option RefCell: Cannot borrow as mutable help	3	785	September 22, 2019
Take and assign to a RefCell help	3	351	July 24, 2022
Less awkward use of RefCell help	2	1051	January 12, 2023

Efficient state machine inside a RefCell - possible?

Related Topics