Move Semantics Rust vs C++

I have three questions regarding move semantics in Rust.

Not long ago I learned what move semantics are in C++ and I realized that move semantics happen almost all the time in Rust and it happens by default. For what I understand, move semantics in C++ have the purpose of reducing deep copying, or reducing memory allocations on the heap by transferring ownership from one variable to another. I understand that when you use the = operator in C++, there is always something copied. (I'm talking about debug builds only for now.)

Let me give an example in C++ first:

struct MyStruct {
    int a;
    float b;
    bool c;
    float* d;

    MyStruct() : a(0), b(0.0f), c(false) {
        d = new float[10];
    }

    ~MyStruct() { delete[] d; }

    MyStruct(MyStruct& other) {
        a = other.a;
        b = other.b;
        c = other.c;
        d = new float[10];
        memcpy(d, other.d, sizeof(float) * 10);
    }

    MyStruct(MyStruct&& other) {
        a = other.a;
        b = other.b;
        c = other.c;
        d = other.d;
        // For what I understand, this line is the big thing that makes move assignments possible
        other.d = nullptr;
    }
};
int main() {
    MyStruct ms;
    // Deep copy, so it's like clone() in Rust
    MyStruct copy = ms;

    // Both structs are valid
}

Now I want to make a move, not a copy:

int main() {
    MyStruct ms;
    MyStruct copy = std::move(ms);

     // ms is invalid now
}

For what I undestand, moving ms into copy means shallow copying the ms struct including the pointer with the heap data and setting the ms's pointer to nullptr. This means that when ms is dropped, delete[] does nothing but a check for null.

Now, in Rust this is almost the same example:

#[derive(Clone)]
struct MyStruct {
    a: i32,
    b: f32,
    c: bool,
    d: Vec<f32>,
}
fn main() {
    let ms = MyStruct { a: 19, b: 3.141, c: true, d: Vec::new() };
    let copy = ms.clone();

    // Both structs are valid
}

With move:

fn main() {
    let ms = MyStruct { a: 19, b: 3.141, c: true, d: Vec::new() };
    let copy = ms;

    // ms is invalid now
}

My question is: How do move semantics actually work in Rust? In the same way as in C++?.

Since the compiler knows that ms is invalid after the move, what I believe is that the compiler can do some small optimizations by not calling the deallocator on ms's heap data pointer in the Vec, because that heap data is moved. In C++, when the old variable is destroyed, the destructor is called and delete is called, which means a function call and an if statement inside that function. Rust should be free to not call any function. This means that, technically, Rust is more performant that C++. (C++ is slower than Rust by a function call and an if statement, not a big deal anyway.) My second question: Is this assumption true? Does rustc make these optimizations?

And one more thing. In C++, I think that doing a move is useless when you have a struct with no heap data, because it will do a shallow copy anyway, so it's the same as a deep copy:

struct MyStruct {
    long a1;
    long a2;
    long a3;
    // ... Lots of fields
};
struct MyStruct {
    a1: i64,
    a2: i64,
    a3: i64,
    // ... Lots of fields
}

And my third question, which is tied to the first one: When you do a move in Rust, does it do a shallow copy? Even on large structs?. I think not.

As I write this, I realize that maybe the second and third questions are irrelevant, because the code will anyway be optimized in release builds. I hope that I was clear enough... And I hope that this post helps a bit.

1 Like

When a value is moved in Rust, that always happens via a simple memcpy of the bytes containing the value. The old value is not cleared or zeroed — it just isn't read again, and no destructor will run on it. The copy that happens inside a move is always shallow since if there are any pointers in the moved value, the memcpy will just copy the pointer but not look inside it.

18 Likes

This is incorrect. In C++ = is an operator that can be overloaded, so it can do literally anything (subject to the capabilities of the computer). In contrast, = in rust is a language feature that always does what @alice said.

If you want a deep copy in rust, there is the Clone trait. However, since that is a trait (like = in C++) it could do literally anything. But normally it just created a deep copy.

5 Likes

I know that = can be overloaded so that it doesn't do what it traditionally does. I wanted to say that = always does a copy by default, not when overloaded.

Ok. So this means that Rust is indeed a little more efficient than C++ in this regard. Right?

Is this true also with release builds? What if I have a large struct that I want to pass to a function by moving it there? I read somewhere that Rust will pass it by reference in the background for performance reasons.

Yes. Not only with regards to moving. Since Rust’s move semantics don’t require to leave the moved-out-of value in some sort of null-like state (I think they always talk about leaving it in an “undefined but valid” state for C++ which often is a null-like state though), you can use types that don’t have any null-state. E.g. a smart pointer like Box<T> doesn’t have any null state, so you can do dereferencing on it without any null-checks. Admitted, not doing null checks is probably something you’ll sometimes do in C++ as well, but in those cases it’s a UB footgun; honestly I’m not too much into C++ in the first place, so correct me if I’m wrong but I’d guess that “proper” handling of pointer types or types containing pointers usually does include some sort of null-check for most operations.

And then there’s also the benefit that you can use the 0 value that’s never used to make other data types containing Box<T> more compact, the canonical example being Option<Box<T>> which is kind-of Rust’s version of a nullable (owning) pointer, and still has size_of::<Option<Box<T>>() == size_of::<usize>().


By the way, I’d like to reiterate this code comparison:

In my opinion deep copying is wayyy too easy and implicit in C++. I’m honestly questioning how you can ever be sure that your reasonably sized C++ codebases don’t contain lots of significantly expensive, unintentional deep copies. I’d be curious to hear on that from people that actually use C++ though, since I’m not one of them. Anyway, one general principle in Rust is that allocations, in particular from deep copies, are supposed to always be quite clearly visible / should not happen implicitly or on accident.

While moves being slightly more efficient in Rust may be a rather small advantage, the explicitness of .clone() seems like a possibly way more relevant performance benefit in Rust.

4 Likes

The compiler can of course optimize moves in various ways. However, it is guaranteed to behave as if it was moved with a memcpy.

3 Likes

:thinking: I think that's still not really correct. IIRC, C++'s "default" behavior for operator= on a struct is just to delegate to each field's operator=. The compiler is of course free to observe that all a struct's members are trivially copyable, and therefore optimize this default implementation of operator= to a single memcpy. But this is not guaranteed. In fact this is a way in which C++ can be more efficient than Rust; if you have a data structure with 100 fields of which only 1 has contents that matter, you can overload operator= to only copy the one field and skip the rest. If you do that, it'll also be the behavior used by everything that contains that struct (unless you overload operator= again).

In Rust, this is more or less reversed: = always does a memcpy (or equivalent), so it is guaranteed that no type will be more work to move (or copy) than just that. But on the other hand, you can't make "extra cheap" overloads like C++ has. The compiler is, of course, still free to observe that some fields are not used later and omit those copies. But it's not guaranteed.

The important difference between Rust and C++ with regard to move semantics is destruction. This only comes into play with types that have destructors, which, in most well-written code, will also have overloads of operator=. So the "default" behavior doesn't apply in the cases you usually care about with regard to move semantics. In the cases where it does apply, i.e. for trivially copyable types, I'd expect C++ and Rust to perform the same, because identifying types that are trivially copyable is an easy compile time analysis.

Since C++ destructors "always" run, C++ types that want to have move semantics have to have a valid moved-from state that may be destroyed. Usually this is indicated by zeroing out something or setting a value to some sentinel, so the main body of the destructor can be skipped at runtime. If a particular variable is always moved out of, the compiler is (again) free to observe that and optimize away the check. But it's not guaranteed. In effect, types that have move semantics in C++ are implicitly Options, and you never actually move them around, you just take them (leaving None).

But in Rust destructors (i.e. drop glue) don't always run. So types with move semantics and types without are just the same: it's all a memcpy, and only the final resting place of an object needs to be dropped when it goes out of scope. In principle this means that copying/moving things with = can be more efficient, and objects may take up less space with niches, as steffahn also pointed out.

But wait -- how does the compiler keep track of when to run the destructors or not? Isn't it impossible in the general case to know whether a particular piece of code will be run or not at compile time? Well, yes - and I lied. Kind of. The compiler will do its level best to figure out at compile time which things need to be dropped and which don't. But if it can't, it will generate drop flags which are really just doing the same runtime liveness tracking that happens with manually written destructors in C++! The difference is that drop flags are part of the stack frame where the variable lives, and not part of the object itself, and they're emitted by the compiler, not written by you.

3 Likes

To my mind, and I believe most C++ programmers, that is a gross misuse of operator overloading. It is making the = operator do something other than what one would expect. It then looks like it is making an assignment but it it is also do a serious transformation/data loos in the process.

move semantics in C++ are indeed very hard to reason about. Which is why C++ has tests like:
std::is_move_constructible
std::is_move_assignable

And a slew of others that one can use in asserts to find out of ones code is moving as one expects.

The major difference is that the move semantics is the language feature in Rust while it's the code convention in C++. In C++ the language provides 2 kinds of references(which is not a real type but still related with the type system) and 5 kinds of values. It's up to programmers to implement move semantics correctly with them. If you misuse these complex tools things may not work as expected but it's your fault.

In Rust everything, including references, are passed by value. A move is defined as single shallow bitwise copy and implies ownership transfer. Moved out values are statically marked as inaccessible, even destructors can't touch them. If the type is trivial like primitive integers you can implement the Copy trait to disable this compile time marker and use moved out values, but it doesn't affect the runtime semantics.

5 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.