Function/method/idiom for in-place substitution using previous value


#21

It was just a trivial example.

Yeah, I didn’t consider panicking in my example, so I assumed it would be fine to leave an invalid state temporarily, as long as I returned it to a valid state before leaving the function. (Like how std::mem::replace has invalid state during the function after the value is copied out and before the replacement is copied in, but obviously the lack of a closure containing arbitrary code makes this safe.)

For some reason this bothers me. I think it’s the fact that although it is now guaranteed to be a valid state as far as Rust is concerned, if the code panics while the value is being manipulated, the program will still be in an invalid state because the value being manipulated should never be empty outside of the function. (If you argue that the code being run will never panic, then there shouldn’t be any problem with the initial solution using std::mem::uninitialized [correct me if I’m wrong].)

What if the implementation of the enum is not defined by me?

Am I wrong in saying that if I never use the value, it cannot cause any unintended consequences? Since the value is just a placeholder while the actual value is being manipulated, I don’t ever actually use the uninitialized value.


#22

Yep, that looks like what I was looking for. The issues mostly seemed to be regarding unwinding behaviour, which I completely forgot to take into account.


#23

Not using a value is tricky, e.g.

foo.a = b;

is a use of a if it has Drop (I’m not sure about non-drop case). That’s why ptr::write exists.


#24

I used std::ptr::write in my implementation.


#25

Yeah (if there is no chance of panic before you overwrite it)


#26

That’s not exactly true, because if you read at the lib sources you will find that that MaybeUninit is used in many ways in order to avoid drop of invalid objects in case of panics.

Sorry, I think i am misunderstanding what you are saying. Because there is nothing wrong in dropping A::Empty, but it is UB to drop A::A(String) if the string has invalid data in it (like after writing uninitialized()). The fact that in normal program flow you don’t expect to have A::Empty (and you can decide to panic in that case or consider it UB, it is just a matter of API) is not relevant to the correctness of stack unwinding.

I think that, unfortunately, this assumption can never be taken. I will try to make a very edge case example, but I think it can be relevant to the concept that “panic can always happen”. Imagine that you made a crate that uses this replace with uninit approach, and a user starts using it in a project. This person is quite smart and is able to write a compiler plugin that inserts a call to a monitoring function randomly inside the AST. He is pretty unlucky and a call have been inserted between the first and the second replace, therefore the content is in an undefined (and potentially invalid) state. If the monitoring function panics (because he is extremely unlucky and he finished the disk space while the program was trying to dump some data), the stack unwinding triggers UB and nasal daemons will come.

This is a pretty extreme case, but security bugs are generally corner cases maliciously used ad-hoc. Obviously, it is very hard to think in this way, but this is the reason why it is important to deeply consider the implications of using unsafe code.

This is a good point. IMHO the best way in this case is to use Option as @kornel suggested. I am not sure at 100%, but I think that using an Option on an enum without an empty state and using an enum with an empty states produces the same exact memory layout. Moreover, most of the time the CPU will predict a Some(enum), and you won’t probably pay almost anything for the conditional jump (probably things will become even better on cold evaluations when #[likely] and #[unlikely] will be usable and inserted in the many unwrap functions).

As @kornel already said, it is almost impossible to predict that the value will not be read, just because “panics happen”.


#27

I have not yet delved into unstable Rust, and in the stable GitHub branch, it looks like it uses std::ptr::read and std::ptr::write.

That was my mistake; I was thinking that it wouldn’t make sense in the context of the program, but of course in the case of a panic, the program is already in an invalid state and none of the values should be used.

That seems like good advice, but for the sake of argument, if I should take a case like this into account, when would it ever be okay to use std::mem::uninitialized in my code? If the answer is never, then why does it exist?
Edit: It’s planned for deprecation, so I guess it won’t exist at some point in the (relatively distant) future.

Sure, when this is possible, but sometimes (such as the case in the example), the values may come inside another structure such as a Vec, in which case you cannot simply convert every element into an Option without allocating a new Vec.

He did not say “panics happen”, but “if there is no chance of panic before you overwrite it”, which I could guarantee in my example. But assuming you couldn’t, could you not use std::panic::catch_unwind like @KrishnaSannasi suggested?


#28

Yes, actually I looked at the unstable branch. In the stable branch uninitialized is used and it is extremely risky for all the reason we are already talking. As you noticed, this API has been deprecated, and I think that it was there before MaybeUninit could be related to some historical reasons, in the sense that probably something like MaybeUninit was impossible until specific language changes (I am just hypothesizing, but I don’t know what the real issue was).

If you are saying that it’s another crate that gives you a Vec with many stateful enums (defined in the crate) that cannot be cheaply copied and that requires an “inplace” manipulation for any reason, the I would suggest that you are trying to fix a problem someone else created :sweat_smile:.

If the situation is real (in the sense that you are not trying to imagine an hypothetical scenario), I think that you should open an issue, explaining why you need to manipulate the data in that way and that an empty private state to the enum should implement something like take in order to own the inner data without involving copies.

If this is not possible as well, I think that you have only to options:

  • Pay the allocation cost to have a very safe code
  • Use the replace uninitialized method (updating to MaybeUninit as soon as it stabilize) and use the catch_unwind.

Unfortunately I never tried to use catch_unwind, therefore I don’t have any suggestion to give you about that. In any case, my points about “panics can always happen” must be taken into account but you can accept that your code will go UB in very edge cases, it only depends on how reliable your program/crate must be. For instance, if you have a WASM single player game, you can easily live with these problems. If you are distributing a crate used for financial transactions, probably it is better to pay the cost of an allocation :smile:.


#29

This did stem from a problem that I actually encountered, but I already managed to solve that. Most of the situations that I enquired about were just me trying to gain a better understanding of the issue for the future. Thank you for your patience in answering all of my questions.


#30

Plz take a look at take_mut, which provides safety by enforcing abort on panic during its operation.


#31

Yes, I saw that suggested earlier, along with a failed RFC which attempted to merge something similar into std. Another suggestion from that RFC was the replace_with crate, which, although not as popular, seems to perform better in a simple benchmark than take_mut by using Drop to run code while unwinding, rather than using catch_unwind.

Edit:
My benchmark results:

   Compiling take_mut_benchmark v0.1.0 ([redacted]/take_mut_benchmark)                                                                                          
    Finished release [optimized] target(s) in 7.14s                                                                                                                                         
     Running [redacted]/take_mut_benchmark/target/release/deps/main-e7f3f96b2173a2cc
take_mut                time:   [50.250 ns 51.757 ns 53.712 ns]                     
                        change: [-54.156% -53.189% -52.137%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe

replace_with            time:   [35.005 ns 35.263 ns 35.572 ns]                          
                        change: [-56.265% -55.651% -54.989%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe

Edit 2:
I moved the variable initialisation out of the iteration in the benchmark to achieve frankly ridiculous results:

   Compiling take_mut_benchmark v0.1.0 ([redacted]/take_mut_benchmark)                                                                                          
    Finished release [optimized] target(s) in 6.87s                                                                                                                                         
     Running [redacted]/take_mut_benchmark/target/release/deps/main-e7f3f96b2173a2cc
take_mut                time:   [16.689 ns 16.785 ns 16.888 ns]                      
                        change: [-67.596% -66.830% -66.146%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe

replace_with            time:   [11.801 ps 11.868 ps 11.942 ps]                          
                        change: [-99.967% -99.967% -99.966%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe

#32

I’ve only had time to scan the rest of the thread, so sorry if I’m repeating what someone else has said, but I’ve seen mostly things about using pointers. In the Rust Programming Language Book, they use an Option to do a mutable swap. As an example, I have a modified version of your code.

TL;DR
Basically, you have your struct with a field, say

struct A {
    b: Option<B> // B is whatever you want to swap
    // Other fields...
}

// For this case, B is an enum with two types
enum B {
    This,
    That
}

Then, you have the function that you want to substitute it in

impl A {
    fn swap(&mut self) {
        // Takes the item in self.b out, replaces it with a None,
        // and returns the item wrapped in a Some
        // For more information, see documentation for Option
        let option_item = self.b.take();
        assert_eq!(None, self.b);
        
        // Unless you allow for otherwise, self.b should always be Some(B)
        if let Some(item) = option_item {
            self.b = match item {
                B::This => Some(B::That),
                B::That => Some(B::This)
        }
    }
}

Let me know if you have any questions.