Why does rust needs to be able to move data around

When people describe the function of pin and unpin they point out that by default rust is specified as being allowed to move datastructures freely in memory.
Why is this the case?

Maybe i am wrong but i thought that C and C++ garanty that a datastructure is always at the same address after instanciation.

So i wonder why rust chose a different path.

2 Likes

I've found a nice answer on Stack Overflow that might give you an initial idea, or at least some material based on which you can do further research into affine type systems and Rust's implementation of one:

(the comments are also worth reading through IMO)

3 Likes

C and C++ have objects/structs that can be assigned to variables and moved just like in Rust. However, when an object is allocated on the heap, it's address does not change unless it is reallocated -- this is also true in C, C++ and Rust.

5 Likes

Put your data structure in an std::vector, insert a few more elements, and you’ll see that it has quite definitely moved around.

For what it’s worth, C++ has this entire thing called move semantics, including move-from references, move constructors, move assignment operators, std::move, std::forward, iterator invalidation, dangling pointers, and who knows what.

6 Likes

Moving does not mean that heap allocated data is moved, only the wrapper (Vec, String, Box) od this data (which contains a pointer to the data) is copied to another wrapper instance and declared uninitialized afterwards. Move moves(!) the ownership via this pointer, exactly as in C++. But in C++ you have to create a move constructor/move assignment operator by your self.

1 Like

Rust just doesn't let you use things, or maintain references to things, after you've bitwise-copied them -- unless they implement Copy. We call call the bitwise-copy of a non-Copy type a move and the inability to use the original place afterwards move semantics.

Those other languages have analogous actions -- moving data around while the original place is invalidated -- even if they don't call them moves. You can't create a data structure on the stack in a function, and return it to the caller, without moving it (modulo returning a dangling pointer).

Also note that Rust also doesn't have a concept of the heap at the language level. Any guarantees about moving or not moving objects placed in the heap are particular to the type that manages the allocation, not the language.


I think the heap examples become easier to talk about if we get a bit more explicit. The standard way to put something on the heap is to Box it, so let's say we have our unboxed value: T and our boxed bx: Box<T>.[1]

When you move the value, the address of the T changes. But when you move the bx, you don't move the contained value; the address of the T doesn't change. You're just bitwise-copying a pointer (that has ownership of what it points to, and move semantics).

Now, let's say we had a value_vec: Vec<T> and we sorted it. Even though they're on the heap, all the T values would get shuffled around in memory. Afterwards there's still a T at every address there was before,[2] but if you're distinguishing objects by their values, you would probably say that they have been moved.

But if you had a bx_vec: Vec<Box<T>> and sorted that, only the pointers get moved around. No values of type T get moved.

So in the above paragraph, "allocated on the heap" corresponds to something like boxing every value of the type you don't want to move (Box<T>), and not just having a value on the heap somewhere (which would include Vec<T>).

But this still isn't the whole story as it relates to pinning. You can still move out values that are in a Box<T>. For example, you can swap in a new value. If you need to prevent that, you need a Box<T> that doesn't allow &mut Ts... or at least, not when values of type T might be sensitive to moving in this way.

And that's what Pin and Unpin are about. It's important for self-referential structs, like compiler-generated futures. The fact that there's some value at a specific address (Box<T>) isn't enough for those; the value itself cannot be moved. Pinning isn't having some stable address place to put values in. It's about preventing particular values from being moved.

The "stable place" property isn't a requirement for pinning, beyond the life of the pinned value. You can pin things that are on the stack. But you need things like Pin<Box<_>> in order to return ownership.


Outside of async/pinning, stable addresses[3] matter less in Rust for a variety of reasons. One of the big reasons is that references are generally short-lived. If you had a reference to a T in that was in either a Vec<T> or a Vec<Box<T>>, for example, that reference would become invalid when you sorted the Vec. You would need unsafe to do something via the address after the sort -- and there's a decent chance even that is invalid, depending on the types involved.

I've also completely ignored the nuances around zero-sized types (but it's quite relevant if you starting trying to do pointer-identity stuff in Rust).


  1. Box is not the only choice, but we'll use it to represent anything with the "stable address storing a single T" property. ↩︎

  2. there was no reallocation ↩︎

  3. and the concept of pointer identity ↩︎

7 Likes

There's definitely no such guarantee. What paragraph(s) of the respective Standards make you think so?

Thanks for all the c++ teaching. (I learned and used it mostly before 2011)

My takeaway primary from What are move semantics in Rust? - Stack Overflow is:
Moves are always possible when a value is moved to a new variable. Moves (that involve memcopy) between two variables within a funktion are usually optimized away. But especially values that are moved out of a function (return values) or into another function might benefit from moving. To give the compiler such freedom moves are the default.Additionally it provides the basis for efficient abstractions like growable arrays (vectors).

Another very common thing is to move a local variable to a struct field. That's often how struct fields with non-Copy types are initialized and assigned values.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.