What is the idiomatic way to replace (in multiple steps) a struct field if the struct is behind a mutable reference?

I have a struct behind a mutable reference and I want to replace one of its fields. Please take a look at the following code:

struct Data {
    huge_vector_that_should_not_be_cloned: Vec<u8>, // not the only large member that should not be cloned
    many_more_fields: u32,
}

struct A {
    data: Data,
    some_more_fields: u32,
}

struct B {
    data: Data,
    some_more_fields: String,
}

enum AOrB {
    A(A),
    B(B),
}

impl AOrB {
    fn a_to_b(&mut self) {
        // Cannot use e.g. mem::take, since Data has no reasonable default implementation
        match self {
            AOrB::A(a) => {
                let data = a.data;  // do not destructure data, in reality it is the "base struct" containing the state of a whole application
                *self = AOrB::B(B {
                    data,
                    some_more_fields: "".into(),
                });
            }
            _ => panic!(),
        }
    }
}

fn main() {
    let data = Data {
        huge_vector_that_should_not_be_cloned: Vec::new(),
        many_more_fields: 0,
    };

    let mut a_or_b = AOrB::A(A {
        data,
        some_more_fields: 0,
    });
    a_or_b.a_to_b();
}

(Playground)

Errors:

   Compiling playground v0.0.1 (/playground)
error[E0507]: cannot move out of `a.data` which is behind a mutable reference
  --> src/main.rs:25:28
   |
25 |                 let data = a.data;
   |                            ^^^^^^
   |                            |
   |                            move occurs because `a.data` has type `Data`, which does not implement the `Copy` trait
   |                            help: consider borrowing here: `&a.data`

For more information about this error, try `rustc --explain E0507`.
error: could not compile `playground` due to previous error

In a_to_b, I am first consuming a, leaving self in an undefined state. Then I am replacing self with something that I have constructed from a, bringing it back into a defined state.

Since self is a mutable reference, noone should be able to observe self while being undefined. However, the compiler does not allow me to do this.

Is there a way to work around this? I would like to avoid implementing Default for A, because in my code, it has no reasonable Default implementation, so I would like to avoid mem::take. Also, I cannot just use clone, because the struct is too large. Moreover, I would like to treat Data as an atom, as in the real code there is much more large stuff than just one large vector that should not be cloned.

Ideally, I would like to only change the implementation of a_to_b for this to work. Otherwise, I could for example make the data member of A into an option to achieve what I want.

1 Like

Is it possible for you to do it like this (note that a_to_b() now takes self rather than &mut self):

impl AOrB {
    fn a_to_b(self) -> AOrB {
        match self {
            AOrB::A(a) => {
                let data = a.data;
                AOrB::B(B {
                    data,
                    some_more_fields: "".into(),
                })
            }
            _ => panic!(),
        }
    }
}

fn main() {
    let data = Data {
        huge_vector_that_should_not_be_cloned: Vec::new(),
        many_more_fields: 0,
    };

    let mut a_or_b = AOrB::A(A {
        data,
        some_more_fields: 0,
    });
    a_or_b.a_to_b();
}

Or how about this:

impl AOrB {
    fn a_to_b(&mut self) {
        match self {
            AOrB::A(ref mut a) => {
				let mut tmp = Vec::new();
				std::mem::swap(&mut tmp, &mut a.data.huge_vector_that_should_not_be_cloned);
                *self = AOrB::B(B {
                    data: Data {
						huge_vector_that_should_not_be_cloned: tmp,
						many_more_fields: a.data.many_more_fields,
					},
                    some_more_fields: "".into(),
                });
            }
            _ => panic!(),
        }
    }
}

fn main() {
    let data = Data {
        huge_vector_that_should_not_be_cloned: Vec::new(),
        many_more_fields: 0,
    };

    let mut a_or_b = AOrB::A(A {
        data,
        some_more_fields: 0,
    });
    a_or_b.a_to_b();
}
1 Like

True. I should have specified that the signature a_to_b is fixed by an external library. Also, Data should not be destructed, it is just an example. In general, to be easy to integrate, I would only change the body of a_to_b.

You can simply take the vector, and use the rest of the fields with FRU syntax, so you can do this (since Vec is Default):

impl AOrB {
fn a_to_b(&mut self) {
    match self {
        AOrB::A(a) => {
            let vec = mem::take(&mut a.data.huge_vector_that_should_not_be_cloned);
            *self = AOrB::B(B {
                data: Data {
                    huge_vector_that_should_not_be_cloned: vec,
                    ..a.data
                },
                some_more_fields: String::new(),
            });
        }
        _ => panic!(),
    }
}

Thanks! Well, it seems like the example allows some shortcuts that are not easily possible in real life. Specifically, the struct Data is holding the state of the whole application, and has many members of varying size that are expensive to clone. Not just the one vector I used here to keep the example small.

The struct Data should be taken as an atom that does not implement Copy or Clone, but can only be moved around.

Well, then what you want is not possible as-is. You can't move out from behind a reference, period. If you have nothing to put back there, then this can't be done.

As a last-resort solution, you could put everything behind an Option and use Option::take(), but that would require unwrapping the data every time you want to access it under normal circumstances.

Or you could just try and come up with a cheap-to-construct placeholder value for Data. If you are worried that it doesn't make sense as a Default impl, then put it behind a private inherent method, and use mem::replace() instead of mem::take().

2 Likes

Great thanks for clearing that up!

How about going unsafe?
Would it be idiomatic to use e.g. mem::zeroed together with mem::replace?

This would rescue me from the option variant.

That would probably be UB with a complicated struct, as there are several common types that must always be non-zero. References and Boxes, in particular, must always point to a valid target.

If you want to go down the unsafe route, you’re better off working with MaybeUninit but that has worse ergonomics issues than Option would— You’ll need to unsafely assert that the data is valid whenever you access it, and you’ll need to write a manual Drop implementation.

2 Likes

No, please absolutely don't do that. Replacing a value with another shouldn't require unsafe. If it causes such a hard time, then that's an indication that your data structures aren't designed correctly. It's probably time to refactor your data model.

Right, the zeroed struct would then probably be dropped, and there is at least one Rc where there might be problems if it is all zeroes...

This is a bit harder to discuss, except if you want to study my application in detail. But my constraints are partially given by the crate iced which seems to be one of the best UI crates available in Rust at the moment. And then by the structure of the application, which I attempted to represent in code.

The enum in the example is roughly representing the UI states. Some of the states include Data, some don't. Hence I decided to put the data struct into only those states that need it. I could easily move Data outside of the enum as an option, but architecture-wise it feels like it does not belong there.


But I am wondering: is the reason that my original code does not compile a restriction of current Rust, such that code like this will compile at some point in the future, or is there a bigger problem here that I do not see?

If there is a panic while the field is invalid, that would of course be a problem. Then the application enters an invalid state assuming the panic is cought. But is this something the Rust project is attempting to avoid?

No, it's not a question of Rust not being smart enough. A fundamental assumption of the language is that a reference always points to a valid, initialized value. This is highly unlikely to ever change.

1 Like

In that case, could you add an Updating variant to the enum to serve as a temporary value for mem::replace? That would let you take ownership of the data, make whatever changes you need, and then put it back.

3 Likes

Hm, but this is confusing. If you for example move a struct into a mutable reference, then during that move, the reference might point to something invalid, right? Arguably, move cannot panic though.

I at least feel like this transformation I am trying to do is not that strange that it would be easy to say it is always "wrong" in some sense.

I think yes that works. I cannot try it right now since I don't have access to the original code right now. But I think it should do it. Great! It is super simple, requires no code changes for updating code that does not pertain it. This changes everything, thanks!

Edit: Ok, it works, great!

A reference must never point to anything uninitialized, that is a fact. Another fact is that moving a value into the referent of a reference doesn't make anything uninitialized. I don't follow the rest of your argument.

That depends on the implementation of the move, right? If a struct a: X with say 5 fields gets moved into b: &mut X, then first the fields inside *b will be dropped one by one, right? After a field is dropped, it is undefined, isn't it? So during the move, some fields will be in an invalid state. The same holds if the fields get dropped during the move, i.e. the first field gets dropped, then the value gets moved over, then the second, and so on.

However, none of these invalid states is ever observed. So it is sound. And if one assumes that Drop does not panic, then there should also be no edge case where the move is somehow interrupted and then someone else can still observe the partial move.

Soundness is not a question of observation, it's a question of the formal (or less formal) language definition (or, in the lack of it, of the compiler's assumptions). UB is still UB even if you don't "observe" anything related to it. Since by the intention of the language designers, you as the programmer are not allowed to ever create dangling references, if you don't uphold that contract, it's unsound by definition and unconditionally, not by happenstance or conditioned upon observation.

However:

  1. I would assume that moving, being a language built-in, is special, and the compiler knows what it is doing when moving into a value behind a reference, so it doesn't define away its own ability to move/copy stuff around (which would be silly, to put it mildly). Quod licet Iovi, non licet bovi. I.e., the ability of the compiler to do something behind the scenes in limited, special circumstances is not a logically good argument for assuming that the programmer is also permitted to do the same things in general.

  2. One doesn't have to define moving into a value as drop-then move. You could just as well swap-and-drop instead. I.e., a completely valid conceptual/formal implementation would be:

    struct MultiField {
       foo: String,
       bar: Vec<u32>,
    }
    
    fn move_into_deref(place: &mut MultiField, new_value: MultiField) {
        let MultiField { foo, bar } = new_value;
    
        let old_foo = mem::replace(&mut place.foo, foo);
        drop(old_foo);
    
        let old_bar = mem::replace(&mut place.bar, bar);
        drop(old_bar);
    }
    

    (where mem::replace() is a stand-in for the atomic builtin behavior of the compiler with regards to returning a bitwise copy of the referent and then overwriting it with the new value.) Hence, nothing behind the reference is ever uninitialized during these operations.

You can't assume that, Drop can panic. However, the soundness of Drop doesn't depend on whether it panics. Since Drop::drop() is only ever called when nobody else owns the value anymore, the worst things a panicking drop can do are 1. panic the whole thread and 2. leak memory. Also note that a custom-written Drop::drop() implementation doesn't need to (and shouldn't) drop the fields of its &mut self argument; that's the job of the automatically-generated drop glue, which runs even later, once even Drop::drop() returned, so there is no possibility for it to create a situation whereby a value behind a reference is uninitialized.

2 Likes

Yes, Rust tries to avoid this problem. Apart from the solutions elsewhere in this thread, you can use the replace_with or take_mut crates to work around it, by either accepting an immediate abort if there is a panic while there is no value for the field or supplying a function that creates a new value.

Wrong. Rust is not C++. Move is not “smart”. Every type must be ready for it to be blindly memcopied to somewhere else in memory.

Even if drop does panic there are nothing to observe.

Physically it contains previous data, more or less. But logically it doesn't exist. Lifetimes are not contiguous in Rust. You can do something like this:

let s: String;
…
if … {
  s = String::from("foo");
} else {
  s = String::from("bar");
}

It's very similar to your example, only variable is create before it becomes valid. But as was already explained compiler is not designed to allow arbitrary manipulations of lifetimes.

Maybe some later, more complicated, language would allow that, but Rust currently, only allows such tricks when you are using language-provided constructs.