Move enum data when only have mutable reference

I have an enum which represents the state of my application, and each state carries some data that is relevant for this state. Say for example:

enum State {
    FirstState, // this has no data
    SecondState(String), // this carries a string as its data
    ThirdState {name: String, counter: u32},
}

Then I store the current app state in an instance current_state : State. When there is a state change then the current_state is changed to the next variant. Now here is the important point for the question: The data which is attached to the states is partially carried over. For example, one could image that when the state changes from the second to the third then the name of ThirdState is taken from the data of SecondState. That is, data moves when switching from one enum variant to the next.

Now here is the problem: In reality, the current_state is inside an tokio::sync::RwLock, because this is all inside a web API. But then the WriteGuard gives me only a mutable reference and I can't move data.

Here is a minimal (non)-working example. (Without RwLock, just using a mutable reference to illustrate the problem.)

struct VoteResults {
    choiceOne: u32,
    choiceTwo: u32,
    choiceThree: u32,
}

enum State {
    VoteResultsProcessed(VoteResults),
    VoteResultsPublished(VoteResults),
}

fn publish_voteresults(state: &mut State) -> Result<(),&'static str> {
    match state {
        State::VoteResultsProcessed(voteresults) => {
            state = State::VoteResultsPublished(voteresults);
            
            Ok(())
        }
        State::VoteResultsPublished(_) => Err("results already published")
    }
}

I have a mutable reference to the enum... so I feel like I should be able to do this variant change with a move of data. I mean, it's all "safe". But I don't see how to do it?

Any help is appreciated.

In general, if you've got a &mut to something, and that something implements Default, you can always std::mem::take out its contents (like the String in SecondState), mess with it as much as you need, and then drop in a whole new enum variant on top of the mutable reference in order to progress from one state to the next.

1 Like

I don't get why it needs something like that. Shouldn't what I do be possible in Rust, without messing with any Default stuff? It's all safe no?

In particular, what do I do if the data doesn't implement Default?

There must be something to tell Rust to do what I want (which is safe right?).

If it doesn't implement Default, you can always std::mem::replace the contents with some other value. But if the only access you have is a &mut through a lock, you're not the owner and can't leave the value uninitialized using safe code. There's got to be something there at all times.

1 Like

Suppose that your state transition function panics. Then it has taken ownership of the old state, but not yet produced a new state. What, then, should be the state another reader of the RwLock sees?

If the RwLock had a “doomed; has no value; cannot ever be locked for read or write again” state, then that would be okay. But it doesn't, so there has to always be a valid value at all times.

4 Likes

If you don't want to implement Default or provide another replacement value you can use either the take_mut or the replace_with crates, which have a bunch of alternatives. The "simpliest" (arguable though) approach they offer is respectively take_mut::take and replace_with::replace_with_or_abort, both of which will abort your program (i.e. it won't be recoverable in any way) if the closure passed panics.

The fact the only way to not impose other requirements is to abort is also the reason this is not in the standard library.

1 Like

Sorry but I still don't get where the problem is when my swap is "atomic" as in the example above.

I guess understand that there would be a problem when I first want to take out the voteresults of the VoteResultsProcessed, and then in a second step change the current_state to VoteResultsPublished(voteresults). Then the problem is that in between the two steps, current_state has no valid value. So then I guess I could use your std::mem::replace or so.

But recall from my example above that my swap is atomic (and this is really what happens in my actual codebase), see line 3 below:

match state {
    State::VoteResultsProcessed(voteresults) => {
        state = State::VoteResultsPublished(voteresults);
            
        Ok(())
    }
...

I don't see what's unsafe here, or why that shouldn't be possible?

On the more low-level for example, I would expect that the memory looks like this, when current_state resides at address 2:

Address 1: ...
Address 2: | Tag, indicating which enum variant we have | Union, which in this case is of "shape" VoteResults |
Address 3: ...
...

Now in my case, all I want is that Rust changes the tag while keeping the union. This is absolutely safe, no? And I feel like this should be allowed when having a mutable reference. So how can I tell Rust to do that?

(More generally, if what's stored in the union is not exactly the same for the different variants, then there might also be a slight modification of the union, but still if this is all done in one atomic step in the code then Rust should realize that it is safe and should allow me somehow to do that, no?)

Said code isn't considered atomic. You move the value in the pattern match, pass it to a constructor, and only after that assign the return value back into the "hole".

That said, the compiler isn't thinking in "atomic or not" terms for this code. You just can't move from behind a reference without unsafe in Rust.


Rust doesn't have an effect system that allows functions (like the State::VoteResultsPublished constructor) to be colored as panic-free.

Rust could be extended to track holes without an effect system by disallowing function calls or such and only doing a local analysis, ala variable initialization, but it would be pretty limiting. That said it would work for your example, with a little adjustment.[1]

    match *state {
        State::VoteResultsProcessed(voteresults) => {
            // Avoid (notional) function call
            *state = State::VoteResultsPublished { 0: voteresults };
            
            Ok(())
        }

Here's a blog post on the topic that covers this possibility and others in a bit more detail.[2]


  1. or making some compiler generated functions / notional-functions magic ↩︎

  2. The now-missing tweet in the last section was something along the lines of "chanting in the distance: panic = abort having semantic impact"" ↩︎

2 Likes

It would be sound (that is, not a source of UB) to change only the tag on an enum (that has #[repr(u8)] or other explicitly chosen representation), from one variant to another with the same fields. However, there isn't any built-in operation to do that, so you would have to write unsafe code to do it.

But your match does not express anything like that: it's a move out and move back of voteresults, and there's no special exemption for "I put it back where I found it right away".

2 Likes

Ok but don't you guys think that Rust should provide a way to make things my case possible?

My point is that there should be, no?

Considering only the language user’s perspective, yes. But there are other considerations that go into these decisions as well, like how a possible solution might interact with other language features, how problematic the existing workaround is, and how much complexity it would add to the compiler.

In this case, the language designers have determined that these other factors make developing this capability more trouble than it’s worth despite its obvious utility (see comments above, for example).

Ok last question. I think I want to do it with unsafe code, as everything else doesn't seem like a good option to me.

What is the "correct" way to do this with unsafe code?

I’m no expert here, but I’d consider accepting an &mut State to ensure that your shenanigans will be unobserved, using ptr::read to get an owned copy you can destructure/manipulate and then ptr::write to store the modified version back in place.

The danger is that you will have two entangled copies of the same thing, and you need to make sure that they collapse back into one by the time the &mut expires, even if a panic occurs that early-exits your function. If you don’t, something will get dropped twice leading to a double-free UB event. Copious use of ManuallyDrop can help with this in some cases, but the details will be dependent on exactly what you’re doing.

In any case, make sure to test it thoroughly with Miri. It won’t guarantee that your code is UB-free, but it can find a lot of common UB-causing mistakes.

The “correct” way to do anything with unsafe is not to do it. Or not to do it directly, at least.

Writing unsafe code is hard, thus you usually either encapsulate the logic into separate crate or have to explain why what are you doing is so unique that it would only happen once and then never agan.

And you, yourself, have complained:

After already getting the answer, no less:

In a sense that's how Rust gets new features: first they are implemented as crates, then they are promoted to standard library or to language feature, if they are used often. That's how Rust got try macro and then ? operator, among other things.

What you are doing is still at step 1: there are crates that make it possible to do, but they are not [yet] promoted to std and/or language itself.

Depending on how often they would be used this may or may not happen in the future.

1 Like
#[derive(Debug)]
pub struct VoteResults {
    choiceOne: u32,
    choiceTwo: u32,
    choiceThree: u32,
}

const PUBLISHED: u8 = 1;

#[derive(Debug)]
#[repr(u8)]
pub enum State {
    VoteResultsProcessed(VoteResults) = 0,
    VoteResultsPublished(VoteResults) = PUBLISHED,
}

impl State {
    pub fn replace_processed_with_published(&mut self) {
        match *self {
            Self::VoteResultsProcessed(_) => {
                // OK to proceed
            },
            _ => { return },
        }
        
        let ptr_to_discriminant: *mut u8 = std::ptr::from_mut(self).cast::<u8>();
        
        // SAFETY:
        // * The enum is repr(u8), which guarantees that the discriminant
        //   is a u8 located at the beginning.
        //   <https://doc.rust-lang.org/reference/type-layout.html#primitive-representation-of-enums-with-fields>
        // * We checked that the current variant is VoteResultsProcessed.
        // * VoteResultsProcessed and VoteResultsPublished have the same fields,
        //   and repr(u8) guarantees deterministic layout of those fields,
        //   so the new interpretation of the data is sound.
        // * PUBLISHED is the discriminant value of VoteResultsPublished.
        unsafe {
            ptr_to_discriminant.write(PUBLISHED)
        }
    }
}

fn main() {
    let mut data = State::VoteResultsProcessed(VoteResults {
        choiceOne: 100,
        choiceTwo: 200,
        choiceThree: 300,
    });
    data.replace_processed_with_published();
    dbg!(data);
}

However, think thrice before actually using this. Every time you write unsafe code like this, you're introducing reliance upon subtle properties of your code that the compiler can't check. You're adding not just the risk that the unsafe code is wrong, but the risk that future changes to the code silently break the properties the unsafe code is relying on. The compiler can't check anything in that // SAFETY comment.

As a general good practice, you should do whatever you can to not use unsafe code, even if it is circuitous, before resorting to writing unsafe code.

In this case, mem::replace() is a good solution; you can swap in a placeholder value to satisfy the requirement.

If you really, really, don't want to (or can't) make up a placeholder value, then replace_with is the second best option — it involves unsafe code, yes, but unsafe code that has already been written and used (and therefore tested) by many people, and which does not depend on the specific layout of State and is thus less fragile to code changes.

And if you really have to write your own unsafe code, write a thorough test suite and run it under Miri, then get it reviewed by experts.

2 Likes

I agree with OP that in a better world this should just work. It's similar to how this works:

let mut x = String::from("abc");
let y = x;
x = String::from("def");

As others have mentioned, the reason it doesn't work from behind a mut reference is that panic unwinding complicates this. Personally I think panic unwinding is a misfeature for this reason, and apparently I am not the only one.

A similar thing is that if this worked, then *a += b could just be defined to be equivalent to *a = *a + b for any a: &mut T, without the need for separate AddAssign trait. The only reason this doesn't work is that *a would be temporarily moved out of.

2 Likes

If you already have an enum variant that is trivial to construct (like State::FirstState in your example code above) then you can use mem::replace to temporarily swap FirstState for the current state. This is a simple and efficient solution with no unsafe code.

If you don’t have such a variant, you can even add a “dummy” variant for this purpose. (But the downside is that you have to deal with this extra variant every time you match on your enum.)

Okay, thanks everybody for your tips and suggestions. What I got from many people now is the following order/precedence of what I should do:

  1. Solve it with safe Rust by using an Option or using std::mem::replace with some dummy value in order to have something written in the enum while making my move.

  2. Solve it with a crate like replace_with which allows me pretty much to do what I want. It uses unsafe code but it is written by people with some knowledge and was tested thoroughly.

  3. Write it with unsafe code by myself. The problem is that I could make a mistake. Most importantly, when a Panic occurs during my unsafe stuff, then I need to react accordingly. (and this is what the devs of replace_with do for me)

Here some comments for the three variants and another question:

  1. I'm not gonna do that, both because it feels wrong to me, and because my code above is in fact a simplification, and my actual data attached to the enum variant is much more complicated and I cannot simply construct a dummy "default" variable for the data.

  2. When I read about the crate, I thought, that's it! However, here is my problem: I first do a match on the enum, and then (in a specific case) I want to modify the enum, and for the modification I want to take out the data. But when I call replace_with then the closure will receive the whole enum, whereas I only want the data attached to the variant that I already matched. So I have to do another dummy matching or so inside the closure, even though I already know which variant the enum has?! Again too ugly for me. If you know how to solve this more elegantly, please let me know.

  3. So I thought why not write my own unsafe code. Because what is actually not a simplification in my code above is the fact that I literally only want to transfer the data to the next variant, and not modify anything. So inside the first variant of the match I can do:

    unsafe {
        let voteresults_temp = std::ptr::read(voteresults);
        std::ptr::write(state,State::VoteResultsPublished(voteresults_temp));
    }
    

    Now my question is, is there a problem with this code? In particular, you (and the replace_with crate) are always talking about panics. So let me understand this. I think the two lines of my unsafe code simply cannot panic, right? They only copy some data. So does that mean I can use this code without worrying about panics, or can there maybe be other reasons why my program would panic in between the two lines? And also, even if the program panics between the two lines, would that even be a problem? I mean, std::ptr::read only copies data right? So I think there is in fact always a valid value in the enum at all times, no? And then, why could this whole problem not be generally solved with unsafe code like this?

The problem is that it can copy data that “shouldn’t” be copied. If voteresults_temp contains a String, for example, it will be dropped on panic freeing its buffer. Now, the corresponding String in voteresults contains a pointer to a freed buffer which will cause UB the next time that String gets accessed, whether that’s for reading, writing, or dropping.

That’s at least why this isn’t a general solution, even if it might be acceptable for your usecase (I haven’t thought through whether there are any other problems here…)