Why the return value was not dropped

Ok, this point needs a bit of clarification: there are three kinds of memory ressources within a program:

  • global / static memory, which is never freed;

  • global dynamically allocated and freed memory, a.k.a. the heap;

  • local dynamically allocated and freed memory, a.k.a. the stack;

StorageDead

A local in Rust thus uses the stack as its backing memory (or a CPU register as an optimization, but let's not consider that here).
When a local exits a scope / frame, it is "auto-freed" by the stack-frame shenanigans of the runtime. In the Rust model, which can be seen when looking at the Middle Intermediate Representation (MIR) of the language, allocating and deallocating these locals is respectively called StorageLive and StorageDead.

So when talking about memcpy, the thing is that a new local is taking a value, i.e., the local is StorageLive-stack-allocated, it is then initialized with a value / a value is written to it (and if that value comes from another part of memory, such as another variable, then that's when a (non-overlapping) memcpy happens.

Depending on whether the initial value was Copy or not, the original value may or may not get invalidated / Rust may forbid us from ever using that other value. One way of using a value is calling its destructor, so, when invalidated, a value will no longer run its destructor; and that's precisely the reason a type with Drop glue / with a destructor cannot be Copy: if it were, when each local left the scope, there would be multiple calls to the same destructor with the same data, which in the case of a destructor freeing heap-memory would lead to a double-free.

Drop glue

Now, a value (such as the one stored in a local variable) may be responsible of some dynamic resource, such as heap-memory, a network connection, a file handle, etc.

In that case, for the sake of convenience, of ergonomics (and thus of safety!), the resource is freed exactly right before that value ceases to exist. That's why we say that such value owns a resource, since without the owner the resource becomes unaccessible. So, something that owns something else has what is called a destructor, or drop glue in Rust parlance: a special method called when the value ceases to exist, charged to close / free / release the resources transitively owned by the value.

Example: Box<i32> vs. &i32

  • Box<i32> is a pointer to a heap-allocated i32, and it owns such allocation, meaning that when a Box<i32> is about to die, it frees the i32 that had been heap-allocated, so that such (remember global) memory may be reused elsewhere.

  • &'_ i32 is a pointer to a i32 that does not own the memory it points to; it just knows that it can dereference-read that i32, so as long as such thing is not done outside the range of validity of the reference, also called the lifetime '_ of the reference.

So, in the following program:

let p: Box<i32> = Box::new(42);
let at_ft: &i32 = &*p;

we can represent the memory layout as follows:

So both p and at_ft are identical at runtime, they are just a number: the (memory) "address" of the heap-allocated 42.

But, at compile-time, i.e., when analyzed by Rust, p and at_ft have quite different semantics: at_ft, for instance, borrows p, so if (the local) p is moved or destroyed, then the 42 in the heap may cease to exist, so Rust then forbids at_ft from accessing the memory it points to. But other than that, at_ft has no interaction with the 42 in the heap, so multiple at_fts may exist without any issue whatsoever (&i32 : Copy):

let p = Box::new(42);
let at_ft = &*p;
let at_ft2 = at_ft;
assert_eq!(*at_ft, 42); // Fine
assert_eq!(*at_ft2, 42); // Fine

This is fine because even if there are multiple pointers to the heap-allocated 42, there is only one "dotted line" (which represents ownership and thus a potential drop) to it.

The three stack-allocated addresses (i.e., the number 0x55c9b3701a40 present three times in the stack), will all be stack-deallocated when the scope they were declared in ends. That's where you were right.

However, since the heap-allocated 42 must only be deallocated at most once (it is "fine" (memory-safe) to never deallocate it; that's called a memory leak, and the only problem of so doing is exhausting memory if such thing is allowed to happen an arbitrary number of times (e.g., within a loop); small-sized memory-leaks that happen a fixed and small number of times are fine, memory-wise), what is not acceptable is having multiple "dotted lines" to 42 (that would lead to a double-free, which is undefined behavior).

If p: Box<i32> was Copyable, then after a let p2 = p; we would have:

which would be UB after both p and p2 called their respective destructors.

Move semantics

That's when move semantics kick in: when doing let p2 = p;, on runtime it will be equivalent to copying the pointer, but at compile-time Rust will consider that the pointer p has been moved, i.e., it has been deactivated: it is no longer usable, and more importantly, it no longer owns the heap-allocated 42:

  • String is like a Vec<u8>,

  • Vec<T> is like a Box<[T]>, i.e., a pointer to a heap-allocated slice (sequence) of values of type T.


In your OP, when you were taking addresses, you were taking addresses of the locals, i.e., of the values in the stack. And as you can see in the diagram, p and p2 do not necessarily occupe the same place in the stack (it is not impossible either, since as an optimisation it may often be the case, but that, of course, cannot be relied upon). With the .as_ptr() you would have got the pointers to the heap, which, like in the diagram, point to the same place in (heap) memory.

14 Likes

I really think about it. You always say that a container type package is needed in Rust to generate heap memory allocation. So what are all the container types in Rust?

I don't think it's possible to enumerate all, because anyone might write some more in the future by means of a crate, but the standard library has several of the most useful ones: Box, String, Vec, HashMap, BTreeMap, just to name a few.

What they have in common, apart from implementing the Drop trait, is there anything else?

What do you mean by that? From what point of view?

I mean, these containers can all be used to distribute data on the heap. Since they all have this capability, they must have something in common.

Well it's not that they are magical or somehow specially known to the compiler. They just… allocate on the heap. Their implementation is just written like that.

Eg. when you call Vec::push() on an empty and 0-capacity vector, the push method will call out to the global allocator, ask for some heap memory that is enough for a number of elements, and move the element being pushed to the first slot of that heap array.

Heap allocation is not l'art pour l'art. Dynamic data structures usually need to perform heap allocation in order to be able to grow or otherwise reorganize themselves. You, as a programmer, shouldn't go "oh yeah, I want heap allocation today because it's a rainy Tuesday". Instead, you should choose a data structure appropriate for solving the problem you have. That data structure in turn will probably involve heap allocation, but that is not the point of container types. In fact, avoiding heap allocation as an optimization can be beneficial most of the time, as good-quality memory allocator algorithms tend to be quite slow, at least compared to a near-0-cost stack allocation.

(There are a few exceptions, eg. when you don't want the address of an object to change upon moving, you can box it – in this case, it's exactly the heap-allocating nature of Box that matters. But this is a very niche use case.)

Ralph Levien's container cheat sheet lists common containers from the std with their memory layout.

2 Likes

Thanks a lot, It's really clear!

Thank you, nice pic!

The more low-level ones have in common that they interact directly with the allocator API (such as [Raw]Vec), and other collections are usually built on top of the low-level ones (such as String).

It is when they directly interact with raw allocations that the types implement Drop, but when types are built on top of such constructions they may not even implement Drop! In that case they "inherit" the drop glue of the stuff they contain / wrap

  • For instance, String is a newtype wrapper / abstraction over a Vec<u8>, so there is no need to impl Drop for String, since when a String goes out of scope, so does the Vec it contains, which triggers the deallocating drop.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.