Moving out of a variable and (re-)setting it later. Is this NULL at compile-time?

Continuing the discussion from Best practice with associated methods:

I was pointed to the .extend method, which somehow I wasn't aware of before.

In the past, I always used .append when I wanted to add the contents of a Vec to another Vec:

fn main() {
    let mut vec = vec!["Hello", "World"];
    {
        let mut prefix = vec!["Start", "Up"];
        prefix.append(&mut vec);
        // `vec` is empty here
        vec = prefix;
    }
    println!("{vec:?}")
}

(Playground)

Output:

["Start", "Up", "Hello", "World"]

But now I learned I can also write:

fn main() {
    let mut vec = vec!["Hello", "World"];
    {
        let mut prefix = vec!["Start", "Up"];
        prefix.extend(vec.into_iter());
        // `vec` doesn't exist here until we assign it again?
        vec = prefix;
    }
    println!("{vec:?}")
}

(Playground)

I was surprised this second example worked. That is because after writing vec.into_iter(), the Vec stored in vec will have been consumed. So what's the value of vec then? It doesn't exist. Yet it's allowed and I can re-assign the variable later.

I assume this is standard behavior and not something recently added to Rust? It feels a bit like a NULL value but at compile-time.


Note that this also works in loops (Playground).

Note that you can also write prefix.extend(vec).


The value of vec on the stack doesn’t change at all by moving it, but the compiler does, statically or dynamically (via “drop flags”) remember that it’s logically no longer containing a value. Whenever a variable might (based on static analysis alone) no longer contain a value, you cannot move out of it (you’ll get a “variable used after move” kind of error). When the variable (with drop glue) needs to be dropped (either because it goes out of scope, or because a new value is assigned), then the compiler can introduce drop flags (like extra bool variables on the stack) that store at run-time whether or not the variable is currently still containing a value, based on which a destructor will be called, or not called. Drop flags can even be needed for individual fields of structs that you partially moved out of.

2 Likes

Pretty much yes, although "doesn't exist" is a strong word. The storage backing the variable is still (logically) there, but it becomes (logically) uninitialized. Which means that pretty much the only thing you are allowed to do with it is re-initialize it – I think it's pretty clear this should be allowed. What wouldn't be allowed is using it (e.g. taking a reference to it or consuming again) while it's in the moved-from state.

1 Like

This topic reminds me a lot of the replace_with crate. I wish a similar method of tracking drop flags could be added to a builtin version. It would make it more ergonomic. But I guess RFC1736 was never accepted for some reason.

1 Like

Works in Rust 1.0.0.

3 Likes

Oh, thanks for the hint!

I think regarding partial moves, there were some improvements to Rust recently, if I recall correctly?

But my example above, is it sound to rely on the compiler understanding this case?

Yeah, I tested it, and Rust will (correctly) refuse my code if I attempt that. Very nice!

I'm amazed that Rust does the analysis here to check if my code is correct. I sometimes use an Option and Option::take a value from there. I also used Default and mem::take in past. It looks like in some cases, like my example above, I don't need it because the compiler does some amazing proving here.

But this brings me back to the previous question: Can I rely on the compiler catching this case always? I guess I can.

Edit: Since it works since Rust 1.0.0, I think I can do. Is there some documentation on it? I can also look myself later. I'm just curious.

As opposed to? UB in safe code is a compiler bug, period. Maybe I don't understand what is being caught.

The nomicon chapter I linked.

Uninitialized Memory

Checked - The Rustonomicon

Drop Flags - The Rustonomicon


Note that the Rustonomicon is a good read for any Rustacean. It covers way more content than just unsafe Rust. There’s e.g. also a lengthy chapter about more advanced ownership-related explanation/information, which mostly covers information for safe Rust code. Like lifetime elision, HRTBs, or variance.

2 Likes

I meant if I can rely on the compiler understanding that moving out (inside a loop, for example) is okay. I'll look into @steffahn's links later. Thanks!

I’m not sure what this question is supposed to mean. What could happen if it was “not okay”? Safe Rust can’t lead to any memory unsafety / UB (unless you run into compiler bugs or unsound library code). Moving out of a variable essentially (logically) leaves it in an uninitialized state, so you cannot access its value anymore without risking UB; hence the compiler will always prevent this.

Yes. In safe Rust, there must exist no way to violate memory safety. At all. You are not supposed to be able to use uninitialized variables, or create dangling references by merely declaring the wrong lifetimes, or perform double-free by dropping a variable twice, etc. These are all errors that the compiler is expected to detect – if it doesn't, that is a soundness bug.

1 Like

@quinedot @steffahn @H2CO3 I think you all misunderstood me. What I meant:

Can I rely on the compiler reasoning that my code is valid and to not throw a compiler-error if some things are slightly different (or if I use a different compiler version/implementation). Like:

fn main() {
    let mut vec = vec!["Hello", "World"];
    for _ in 0..3 {
        let mut prefix = vec!["Start", "Up"];
        prefix.extend(vec);
        // `vec` doesn't exist here until we assign it again?
        if true {
            vec = prefix;
        }
    }
    println!("{vec:?}")
}

(Playground)

Errors:

   Compiling playground v0.0.1 (/playground)
error[E0382]: use of moved value: `vec`
 --> src/main.rs:5:23
…

Of course, there should never be UB at runtime. My question was aiming whether I can rely on the compiler understanding that moving out of the variable inside a loop is safe if I set it afterwards.

Also compare:


P.S.: I see now, it's very similar to the example provided behind the link (Checked Uninitialized Memory) @steffahn provided.

The Nomicon says:

Of course, while the analysis doesn't consider actual values, it does have a relatively sophisticated understanding of dependencies and control flow.

I would have to hope my example is easy enough for the "sophisticated understanding of dependencies and control flow" catching my case. Which should be the case in my very first example in this thread, but might be difficult to judge about in other cases, especially if some day we have more than one implementation of rustc.

If things are “slightly different”, well… depends on what “slightly different” means. Regarding different compiler versions, the stability guarantees say that (with exceptions around soundness issues or quirks that nobody ever used) code that compiles will keep compiling in the future. Also, as far as I know the logic around variable initialization hasn’t really changed in the past either (regarding the question of getting a compiler error when moving to an older compiler version), but don’t quote me on that. I don’t know what a different “compiler implementation” would be, there’s only rustc for Rust.

As you might have noticed, the “relatively sophisticated understanding” does handle all kinds of control-flow constructs, but cannot reason about run-time data at all, e.g. if true or your if foo {} if !foo {} vs ef foo {} else {} example.

As far as I can tell, it can handle complex things like “a (previously potentially uninitialized) variable is initialized if it got initialized in all arms of a match that didn’t diverge”, and similarly it supports loops, realizing things like “a (previously potentially uninitialized) variable is initialized after a loop if it’s initialized before every possible loop exit point that can lead there (including the end of the loop body in a while/for loop, and every break statement in any loop … well and for while/for loops there’s also the case of the loop not running at all)”.

There's a difference between a (formal) language definition (i.e. "Rust") and it's implementation (i.e. "Rust").

Now this makes more sense in other contexts, such as Python:

Another example: LuaJit being another implementation of the Lua programming language.

I know we have only one implementation for Rust currently, and maybe always will. But I still think it's good to have properties of the language well-defined rather than "if the (current) implementation compiles it, it will compile".

But I also see it's still a process to document all of Rust's behavior and features which will likely take a long time.

I think advanced features such like the one discussed in this thread, but also others like Temporary Lifetime Extension require a lot of work to be documented properly/usefully.

Anyway, I'm happy these things exist, even if documentation isn't always finished/final (yet).

P.S.: But it may be worth noting that this may come at a price (which is the higher complexity in describing what's allowed and what's not, which would also make it more difficult to create an independent yet compatible implementation of the language in future).

As far as I know, Rust is currently fully defined by the behavior of rustc, and there’s no formal language definition beyond that. Though the rust reference tries so document (some major aspects of) the language, but it also states

Finally, this book is not normative. It may include details that are specific to rustc itself, and should not be taken as a specification for the Rust language. We intend to produce such a book someday, and until then, the reference is the closest thing we have to one.


The details of complex language features like, in particular type inference and the borrow checker, make it incredibly hard to create a formal language definition for Rust that covers all the (stable) capabilities of (current) rustc. It’s even harder since rustc is also in quite rapid development still, regularly introducing new stable language features or standard library features.

The rules for variable initialization seem rather straightforward in comparison to borrow-checking, drop checks, type inference, trait coherence, and such things. You don’t question whether it’s “okay” to rely on the compiler when it accepts your slightly complex use of references (or other types with lifetimes) either. If we eventually get Polonius done and stabilized, many people are going to use it, probably without even knowing that it’s a rather new feature / language change; similar to how NLL is now a stable and essential part of the compiler and the language.

A compiler-agnostic language definition usually also comes with a need for compiler-support to check code for whether it’s portable w.r.t. that language definition (i.e. not using any features beyond the minimal required compiler features). I think rust users are “limited” enough by the capabilities of stable rustc compared to unstable features on nightly because of the careful approach taken to avoid any misfeatures and/or the need for future breaking changes; I don’t think there’s currently too much interest in defining an even more severely limited “common ground” for an abstract Rust language that could be implemented by multiple compilers, and for which portable / compiler-agnostic libraries could then be written.

:flushed:

A non-normative reference. Okay, why not :wink:. (This just surprised me, but it makes sense.) Anyway, I likely have read that section in the past, but I wasn't aware of the implications and/or forgot about it again.

Don't get me wrong please, I like the way how Rust proceeds and I like how it is very quickly evolving and gives us tools that we lack in other languages. Like I said earlier: I'm amazed that Rust does the analysis which allows me to move a value out inside a loop as long as I reassign the variable in the iteration. In other languages, we have checks at runtime, including NULL pointer exceptions, issues with mutability, etc. So I really love Rust in that matter! I didn't want to come off as criticizing Rust in a bad way here.

I also see how other rules are more complex:

I just notice again and again that Rust is far more complex than it appears. (And even after thinking I understood the language more or less, I have to realize I'm only understanding a small fraction of it.) There's really a lot of good work done, and I think Rust is by far outstanding in the domain of programming languages (though I don't know that many languages, admittingly).

1 Like

You can do some simple tests for things like this.

This compiles:

    let x;
    {
        x = 4;
    }
    dbg!(x);

But if you try to do

    let x;
    if true {
        x = 4;
    }
    dbg!(x);

then you'll get

error[E0381]: use of possibly-uninitialized variable: `x`
 --> src/main.rs:6:10
  |
6 |     dbg!(x);
  |          ^ use of possibly-uninitialized `x`

Similarly, you'll see different behaviour from while true { ... } and loop { ... }.

For debugging, all conditionals are treated as fallible when it comes to initialization checking and borrow checking and such.

2 Likes

And yet, if my understanding of the work on Iris and RustBelt is correct, there is a group attempting to do just that (currently at the level of a subset, I believe, but they've used the approach to find soundness bugs in the stdlib). cf. https://iris-project.org/pdfs/2021-rustbelt-cacm-final.pdf and related work at https://iris-project.org/ and RustBelt: Securing the Foundations of the Rust Programming Language (POPL 2018)

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.