Function/method/idiom for in-place substitution using previous value


#1

Firstly, I am still learning Rust, so I may be misunderstanding or overcomplicating some of this.

Is there a function/method or idiom of in-place substitution of a value (through a &mut reference) with another value with transfer of ownership of some members of the original value (see example below)?

Consider a situation in which a value needs to be replaced, but some of the contents of the value is expensive to reallocate/duplicate. Ideally, ownership of the original value could be obtained and transferred into the new value, which would then be assigned to the original reference, but this would leave the original value invalid while this is in progress. Note that for some values (such as Strings and Vecs), empty values do not actually require any memory allocation, so an empty value can be std::mem::replaced in with relatively negligible cost, but this may not always be possible for all types.

Here is an example solution that I wrote (see mutate and try_mutate at the bottom):
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=4d52935dec6b778cffb63ed7affff095
Note: it may not be technically correct. I have used unsafe Rust and am only new to this, so I may have done something stupid.
Note 2: I am using Strings as a placeholder for a data type that can be expensive to allocate, but obviously in this case the Strings could be replaced with empty strings temporarily.

Is there a better idiom or similar existing functions that would be more appropriate to use?


#2

I am not totally sure what you are trying to achieve.

Let’s take the mutate function:

  1. You replace the content of reference with uninitialized memory (which is unsound in any case, see below) and take the ownership of the old value;
  2. You pass the ownership of the old value to the function, which returns a value of the same type;
  3. You write the returning value inside the reference.

First of all, you need to think that if your function panics, you are unwinding the stack with an invalid object, so your program will crash (if you are lucky).

But, as I said before, I am not sure what you are trying to achieve. Why not just using a mutable reference to the function instead of the ownership of the object? But maybe I am missing something.


#3

The issue with passing a mutable reference instead of transferring ownership is that an object like a String would have to be copied, which would require a (potentially large) heap allocation which could be completely avoided if the ownership of the existing value were transferred.

And yes, there are issues with my current implementation, but it was more of a demonstration of what I was looking for than a working implementation.


#4

Setting mem::uninitialized on any non-trivial Rust value is super dangerous. Rust doesn’t allow such values to exist at any point, and it may try to use or free uninitialized memory, causing all kinds of undefined behavior and crashes.

mem::replace or mem::swap are safe and often very cheap. For types that store data on the heap, like Box, Vec, and String, the replacement only swaps pointers to the data.


#5

But passing a mut ref does not involve any kind of allocation. Can you be more clear about this point?

EDIT
The make things a bit more clear: try to create a simple struct that does not derive Clone nor Copy. You will see that you will not need any kind of allocation (other than passing a pointer on the stack or on a register).


#6

Consider the following code (using enum def’ns from the example):

    for a_or_b in &mut a_or_bs {
        *a_or_b = match *a_or_b {
            AOrB::A(ref s) => AOrB::B(s.clone()),
            AOrB::B(ref s) => AOrB::A(s.clone()),
        };
    }

It requires s.clone() for it to work, which requires the String to be reallocated. How would this be implemented without allocating a new String object at any point?


#7

We can take advantage of the fact that an empty string doesn’t allocate, and us a sentinel value

for a_or_b in &mut a_or_bs {
    *a_or_b = match std::mem::replace(a_or_b, String::new()) {
        AOrB::A(s) => AOrB::B(s),
        AOrB::B(s) => AOrB::A(s)
    }
}

#8

Okay, how would you resolve this example: Playground


#9

I was just using String as a placeholder for a values that could be expensive to copy.


#10

Ok, I didn’t see that note before.

In that case, I think this is safe

for a_or_b in &mut a_or_bs {
    // using uninitialized memory as a sentinal value
    // this is safe because there is no way to panic
    // between here and overwriting the uninitialized memory
    // and the uninitialized memory doesn't every get read
    let new_a_or_b = match std::mem::replace(a_or_b, unsafe { std::mem::uninitialized() }) {
        AOrB::A(s) => AOrB::B(s),
        AOrB::B(s) => AOrB::A(s)
    };
    // this is needed because it is unsound to drop uninitialized memory,
    // and a normal assignment drops the old value.
    unsafe { std::ptr::write(a_or_b, new_a_or_b) }
}

This is how you would do it in general.


#11

Note the considerations for using uninitialized memory

  • You cannot ever read uninitialized memory
  • Drops will read memory
  • Panics drop values
  • Normal assignments drop values
  • Therefore both of the above are banned when dealing with uninitialized memory.
    • Similarly you should ban calling other functions that you don’t own or audit regularly when dealing with uninitialized memory, so that there are no accidents when dealing with uninitialized memory.

#12

I was wondering about that. In my initial example, I used a similar solution, but I isolated it to a separate method which accepted a closure as a parameter. In that case, as far as I can tell, everything should work except panics. Is there a solution for that?

(The methods are at the bottom of the original example.)


#13

No, because panics will always drop values before you can do anything about it.

*actually I don’t think this is true, you may be able to catch the panic with catch_unwind or abort


The normal way to get ownership of a mutable value without unsafe is to have an Option that has an invariant that it will always be Some. Then you can use Option.take to get ownership of a value.

You could even create a type like below, which brings back nullable values, from languages like Java. But this has the advantage of being explicit.

enum Take<T: ?Sized> {
    Uninit,
    Value(T)
}

impl<T> Take<T> {
    pub fn take(&mut self) -> T {
        match std::mem::replace(self, Take::Uninit) {
            Take::Value(val) => val,
            Take::Uninit => panic!("Tried to use uninitialized memory!")
        }
    }
    
    pub fn set(&mut self, value: T) { self.replace(value); }
    
    pub fn replace(&mut self, value: T) -> Take<T> {
        std::mem::replace(self, Take::Value(value))
    }
}

impl<T: ?Sized> Deref for Take<T> {
    type Target = T;
    fn deref(&self) -> &Self::Target {
        match self {
            Take::Value(val) => val,
            Take::Uninit => panic!("Tried to use uninitialized memory!")
        }
    }
}

impl<T: ?Sized> DerefMut for Take<T> {
    fn deref_mut(&mut self) -> &mut Self::Target {
        match self {
            Take::Value(val) => val,
            Take::Uninit => panic!("Tried to use uninitialized memory!")
        }
    }
}

#14

It sounds like you want RFC1736 replace_with (closed) – read those comments for a lot of discussion about the hazards. Nonetheless, the take_mut crate tries to provide this functionality in a safe way.


#15

Another, perhaps easier approach, could be to wrap the expensive value in Rc or Arc to make cloning cheap. If the value is in Option, there’s also Option::take().


#16

@nikomatsakis wrote an article related to this topic:

After NLL: Moving from borrowed data and the sentinel pattern (Archived)


#17

@Ontonator sorry but I was in hurry and I did not have enough time to reply. Others already wrote what had to be written, but I still want to add some points.

My first point is that in your examples you are storing the same type in the different variants of the enum. It is just because it is a trivial example or it is intended? Because if you have all the enum variants that contain the same type, then you have a design issue. You should have a struct with the actual value and the stateless enum.

The second thing is related to what others have already said: you always need a valid state, otherwise your code is UB prone. Remember that in C++ you can only leave a move-d object in an unspecified state, but it must be valid because later a destructor will be called. I am saying this because, even if Rust has a different concept of move, there is no magic. Valid states are always needed for stack unwinding.

My trivial solution to your problem is similar to the Option one, but IMHO it can have some advantages. The solution is just adding an Empty state to your enum. And you can consider this state as internal, in the sense that it is undocumented and cannot be normally used.

This is how I would solve it. I also added another enum variant with a different inner type to have a valid use of a stateful enum. Notice the #[doc(hidden)] attribute, which should avoid an incorrect usage. It is also possible to use the #[non_exhaustive] attribute for the enum, but honestly I am not extremely enthusiast of that (excluding a variant from the exhaustiveness requirements would be better IMHO).

Don’t get me wrong however: if you know that your contained data is trivially droppable (I am not sure this is correct, I am mixing C++ and Rust terms :sweat_smile:) and that it does not have any invalid bit pattern, like [i32; 32] for instance, you can use the unsafe approach because it will be sound.

Now I have a question to others like @kornel, @KrishnaSannasi or @cuviper: do we have any way to detect if a type is trivially droppable and if it has an always valid property? Do we have any open RFC about them? I think that these feature would be very handy, especially when we will have specialization.


#18

Careful there. If LLVM figures out that the memory is uninitialized, it will summon nasal demons. You can think of LLVM’s bytes having 9 bits, with that evil bit set by uninitialized and being illegal value.

However, there is a way to see if type has a destructor: https://doc.rust-lang.org/core/mem/fn.needs_drop.html


#19

Actually, LLVM represents each byte as having 257 possible values, with the extra possible value being uninitialized, which from an LLVM perspective means meaningless unconstrained value. The optimization passes in LLVM are free to make any semi-self-consistent assumptions about that value and prune other code accordingly. (E.g., condition && undefined can be presumed to be a constant true or a constant false, at the compiler’s whim, with other code deleted accordingly.) Those assumptions will usually lead to unintended-by-you consequences.

Edit: deleted the “constant true term”.


#20

Further reading on how under is far more insidious than usually expected: