Are dropping a value and releasing its allocated memory synonyms in Rust?

Hello everyone,

I would like to ask a question about dropping values in Rust (when the drop method in the Drop trait is automatically called) and I would appreciate if you could kindly make some clarification.

When I read the chapter 4 of The Book, about borrowing and ownership, I understood that once a value is being dropped, it is finished. For me from a C programming point of view, a call to the free() function allows to free the space allocated by a previous call to the malloc() function.

So when I read the chapter 4 of The Book, I thought that the drop method is a similar concept but in a more robust way thanks to the borrowing rules. But the ultimate goal is the same, that is, freeing resources ( = memory). Therefore I had understood (and apparently misunderstood) that :

A variable has been dropped in Rust = its allocated memory has been released

However, when I read the chapter 15 of The Book about smart pointers, I read for the first time about the Weak pointer. Here is what we can read in the module documentation of Rc smart pointer : std::rc

The downgrade method can be used to create a non-owning Weak pointer. A Weak pointer can be upgraded to an Rc, but this will return None if the value stored in the allocation has already been dropped. In other words, Weak pointers do not keep the value inside the allocation alive; however, they do keep the allocation (the backing store for the inner value) alive.

I think I'm completely confused by the last statement that I indicated in bold in the above mentioned quote. Based on this definition (or I should rather say what I understand from it), we can have a dropped value whereas its allocated memory is still valid for that value.

Therefore, my question is, what happens exactly when the drop() method is called? Is it just a some kind of mark put on a value at runtime so that it cannot be used anymore or is it actually releasing the allocated memory which leads to the total destruction of that value?

Thanks in advance

There is a difference between memory allocated for a value, and allocated by the value.

Also, not all values are themselves allocated on the heap! Stack-allocated values can be dropped too, but that doesn't mean their memory is going away (it may be reused for other values though).

Dropping a value means that its destructor (drop method) is called, and that will (hopefully, if its destructor is correct) clean up any resources that the type itself allocated. This is not only memory. This may well be file descriptors, thread pools, database connections, etc.

For example, if you drop a String, its backing heap buffer will be freed by its destructor. However, if you declared the string itself on the stack, then the stack doesn't just "shrink" (a given stack frame in a given function call is of a constant size).

Similarly, if you put a String in a Box, and you overwrite it, then the allocated memory inside the Box itself doesn't go anywhere, because it will be overwritten by the new value, but the old String is still destroyed:

let mut s = Box::new(String::from("old"));

// here, old string is dropped,
// its heap buffer is deallocated,
// but the memory containing the String
// struct _itself_ is not deallocated.
*s = String::from("new");
5 Likes

There’s two kinds of memory that can be associated to a value. For example let’s take a variable x: Vec<i32>. Then there is the memory that holds the variable x itself, sometimes called the “shallow” data of the Vec, which will consist of 24 bytes on a 64-bit system (3*8 bytes, i.e. 3*64 bits), containing a pointer, a length value and a capacity value. And then there’s the (possibly) large chunk of heap memory that contains all the vector elements. More precisely, it contains the shallow data of the vector elements. If they have their own heap allocations associated, than that’s yet again elsewhere, e.g. in a Vec<Vec<i32>>-

Now, many type in Rust involve no data beyond the shallow data. E.g. every Copy type; but also more. Some type, including Vec, can also conditionally come with additional non-shallow (“deep”?) data, e.g. a Vec only comes with the heap allocation for its elements if it isn’t empty; an empty Vec on the other hand can (provided it’s zero-capacity) also be free of additional heap-data.

Now, a drop implementation will contain the custom logic to free (in the C sense) the heap memory associated with any deep data. The most direct analogue of simple malloc and free in C, would be the usage of Box<T> in Rust. So the specific behavior of the specific type Box for (simple, single-value, single-owner) heap allocations.

The Drop implementation of Box<T> will[1] include a call to free() for the shallow memory of the contained value of type T, and it will also recursively call the destructor of the T value itself, if it has any. For Box<i32> there would be no such additional destructor, but e.g. for Box<Box<i32>> there would be an inner box to be de-allocated, too. By the way, zhe allocation(s) was/were created in the first place by the call to Box::new (as well as the construction of the value passed to Box::new).

Arc<T> and Weak<T> can now split up the two steps of calling the destructor of the inner T and deallocating the heap memory that comes from the Arc itself. E.g. for Arc<Box<i32>>, as soon as the last Arc is dropped, the memory holding the i32 is dropped. As soon as the last associated Weak<Box<i32>> is dropped, the heap allocator that contains the (now dangling) pointer to the Box’s heap memory, alongside the reference-counters that come from Arc, is de-allocated.

If you drop a simple value like i32, no heap allocation is involved or freed. If you drop a let x: Vec<i32> variable, as opposed to a Box<Vec<i32>>, then the Vec’s shallow data was only saved in a stack variable, so for that data, no additional thing like “free” is needed, though stack space can be thought of as some (different) form of allocation, too; but it’s without any run-time overhead.

Rust also knows the concept of zero-sized data types. For example the “unit” type (). These are handled specially by types like Box and Vec, so that Box<()> or (even non-empty) Vec<()>s for example don’t come with any heap allocation at all, and thus dropping those will also result in no “free” calls.


  1. except for zero-sized types T, see below ↩︎

4 Likes

Here’s an illustration of heap and stack data with an operation similar to the the one in @H2CO3's code example above, with fake (and shortened) addresses.

let x: Box<String> = Box::new(String::from("Hello!")); results in

x's data on the stack:
    0x369cf (1 pointer)

starting from position 0x369cf in the heap:
    0x47ad0 (1 pointer)
    0x00006 (one number for the String length)
    0x00006 (one number for the String capacity)

starting from position 0x47ad0 in the heap:
    'H' (UTF-8 'H', one byte)
    'e' (UTF-8 'e', one byte)
    'l' (UTF-8 'l', one byte)
    'l' (UTF-8 'l', one byte)
    'o' (UTF-8 'o', one byte)
    '!' (UTF-8 '!', one byte)
    (no trailing null)
*x = String::from("Hi!"); will now do

* allocate new heap data, e.g. at 0x58be1, and write the "Hi!" data to it
  This comes from the call to `String::from("Hi!")`

starting from position 0x58be1 in the heap (newly allocated):
    'H' (UTF-8 'H', one byte)
    'i' (UTF-8 'i', one byte)
    '!' (UTF-8 '!', one byte)
    (no trailing null)

* call (the equivalent of) free() on 0x47ad0 (with the length 6, or maybe slightly higher,
  depending on how much the String implementation did allocate)
  This call stems from the `Drop` implementation of String that was invoked by assigning to `*x`

* write the new information (shallow data of the String) to the heap at position 0x369cf

starting from position 0x369cf in the heap (modified):
    0x58be1 (1 pointer)
    0x00003 (one number for the String length)
    0x00003 (one number for the String capacity)

The stack data for `x` remain unmodified.

By the way, length and capacity of a Vec or String will generally only differ once you use the String (or Vec) as a buffer, incrementally adding elements, without knowing (or at least without telling the String) the full length in advance. Though besides that, it could also happen if the standard library implementation were to decide to avoid particularly small allocations, or allocations that aren’t a factor of 2/4/…, or things like that; i.e. there’s no guarantee that a small difference between length and capacity couldn’t appear even in case where the length was known in advance.

3 Likes

Hello, first of all, thank you very much for your time and your help.

There is a difference between memory allocated for a value, and allocated by the value.

I'm not sure to understand this statement when you make a difference between "for a value" and "by a value".

However, if you declared the string itself on the stack, then the stack doesn't just "shrink"

If I understand correctly the string description based on Chapter 4.1 in The Book, when you declare a String, only the tuple (pointer, length, capacity) goes on the Stack but the buffer including the actual data whose address is hold by the pointer (1st component of the tuple) is always on the Heap. Therefore, I'm not sure to have understood when you said "if you declared the string itself on the stack"

Similarly, if you put a String in a Box, and you overwrite it, then the allocated memory inside the Box itself doesn't go anywhere

I'm not sure to understand what do you mean by "memory doesn't go anywhere". What I understand by the concept of releasing memory (please correct me id I'm wrong) is that at a given moment some memory is allocated (= reserved) for a specific item in a program. If for some reason (for example the item goes out of scope), then the operating system, during the run time (or even after), releases this memory and "considers" this "again" to be part of the global available memory for any program (including the one currently being run)

// here, old string is dropped,
// its heap buffer is deallocated,
// but the memory containing the String
// struct _itself_ is not deallocated.
*s = String::from("new");

When you say

but the memory containing the String struct itself is not deallocated.

You mean that the part (Pointer, length, capacity) remains where it was, that is, on the stack, but updated with new information related to the newly allocated buffer on the Heap, is that right?

Hello

Thank you very much for your help and for this detailed description.

One question about the example you provided

at position 0x369cf in the heap:
    0x47ad0 (1 pointer)
    0x00006 (one number for the String length)
    0x00006 (one number for the String capacity)

I think the tuple (Pointer, Length, Capacity) goes always on the Stack but here I see that you wrote : "at position 0x369cf in the heap:" and you included all the three parts in the Heap.

Did I misunderstand your example?

The example features the (maybe, realistically speaking, not all that commonly used) type Box<String>, not just String. In a Box<String>, the pointer+length+capacity information is put onto the heap, due to the Box, and in turn only the pointer pointing to that heap location containing those 3 things is held on the stack (unless the Box<String> itself is placed somewhere else than in a local variable, in which case, that could end up in yet-another heap location, again!)

It is wrong to say in all generality that when you have a String, the (pointer, length, capacity) data lives on the stack. It can live on the stack; if will live on the stack if the String gets put into a local variable, or into a field of a local variable. But it can just as well land on the heap, too (at a different place than the string’s text data), if the String is put into a Box<String> or Arc<String> or Vec<String> or the like.

For example, with structs, like e.g. tuples, if you have (String, bool) in a local variable, then the String’s (pointer, length, capacity) info is still on the stack, as tuples (and structs) in Rust contain their fields directly without any pointer indirection. But e.g. a Vec<(String, bool)> then is comparable to a Vec<String> in that the (pointer, length, capacity) lands in the heap-data of the Vec.

2 Likes

I see. Thank you very much for pointing this out. I'm still quite a newbie compare to you in this field and I still have a very long way to grasp these concepts properly! :smile:

With regard to the Weak smart pointer in my original post when the documentation says:

Weak itself makes no guarantees about the value still being present. Thus it may return None when upgraded. Note however that a Weak reference does prevent the allocation itself (the backing store) from being deallocated.

If an UPGRADE returns NONE, this means that the corresponding Rc pointer managing the ownership has already been dropped, no? If by "the backing store" they mean the allocated buffer in the heap, then why the Weak pointer keeps preventing the deallocation of the buffer if there is no more value there?

These are one and the same thing.

When you declare a string like this:

let s = String::from("foo");

then the variable s is on the stack, that's the memory allocated for the string (i.e., the buffer pointer, capacity, length triplet), by the compiler, completely at compile time (by arranging the enclosing function's stack frame in a particular way). Meanwhile, the string allocates some bytes on the heap for the actual contents ['f', 'o', 'o']. This is the memory allocated by the string.

This is the case with the heap, but not with the stack. Within a given function, a stack frame has constant size. If you do the following:

let mut s = String::from("foo");
s = String::from("another value");

then the place of s will be exactly the same in memory (i.e., on the stack), it doesn't go anywhere, just because the heap buffer that the old value in that variable managed was dropped (due to the subsequent assignment).

This is exactly the same as what happens in C, except that C doesn't have destructors so you leak memory if you blindly overwrite pointers. But the low-level mechanics of memory management and the distinction between the stack and the heap work in exactly the same way. There's no significant difference in the memory models of the two languages in this regard. (The significant differences are in the operational semantics, i.e., what you are allowed and not allowed to do according to the abstract machine, but that's something that goes beyond the scope of such simplified understanding.)

The above example would be in C:

struct String {
    char *buf;
    size_t cap;
    size_t len;
};

size_t len = 3;
size_t cap = len;
char *ptr = malloc(cap);
memcpy(ptr, "foo", len);
String s = { ptr, cap, len };

free(s.ptr); // inserted automatically by the compiler

size_t new_len = 13;
size_t new_cap = new_len;
char *new_ptr = malloc(new_cap);
memcpy(new_ptr, "another value", new_len);
String s = { new_ptr, new_cap, new_len };

s = { new_ptr, new_cap, new_len };
2 Likes

In my example containing Box, the String (i.e., the pointer-capacity-length triple) is also on the heap, exactly because it's in a Box. Thus, in this case, the String itself is on the heap too, not only the buffer. In contrast, by "if you declared the string itself on the stack", I was referring to not putting the String in an additional Box.

1 Like

Let’s do a Rc<String>, which will look something like

let x: Rc<String> = Rc::new(String::from("xyz"));
let y: Rc<String> = Rc::clone(&x);
let z: Weak<String> = Rc::downgrade(&x);

x's stack memory:
    0x147a (pointer)

y's stack memory:
    0x147a (pointer)

z's stack memory:
    0x147a (pointer)

heap from 0x147a:
    0x0002 (strong counter)
    0x0002 (weak counter)
    0x158b (pointer of String)
    0x0003 (length of String)
    0x0003 (cap of String)

heap from 0x158b:
    'x'
    'y'
    'z'

(the weak count is the one weak pointer z, plus a virtual counter of 1 for all the strong Rcs)
(implementation details only as far as I remember, maybe I’m misremembering how exactly the counters work, but that's besides the point)

If we now drop x, it will merely modify the counter

let x: Rc<String> = Rc::new(String::from("xyz"));
let y: Rc<String> = Rc::clone(&x);
let z: Weak<String> = Rc::downgrade(&x);
drop(x);

x's stack memory:
    0x???? (garbage or uninitialized; value logically no longer exists)

y's stack memory:
    0x147a (pointer)

z's stack memory:
    0x147a (pointer)

heap from 0x147a:
    0x0001 (strong counter)
    0x0002 (weak counter)
    0x158b (pointer of String)
    0x0003 (length of String)
    0x0003 (cap of String)

heap from 0x158b:
    'x'
    'y'
    'z'

If we also drop y, it will drop the String and thus deallocate the data around 0x158b, but it will not deallocate the “backing store” of this shared pointer, which contains the reference counts and the (memory for the) String’s shallow data

let y: Rc<String> = Rc::clone(&x);
let z: Weak<String> = Rc::downgrade(&x);
drop(x);
drop(y);

x's stack memory:
    0x???? (garbage or uninitialized; value logically no longer exists)

y's stack memory:
    0x???? (garbage or uninitialized; value logically no longer exists)

z's stack memory:
    0x147a (pointer)

heap from 0x147a:
    0x0000 (strong counter)
    0x0001 (weak counter)
    0x???? (garbage or uninitialized; value logically no longer exists)
    0x???? (garbage or uninitialized; value logically no longer exists)
    0x???? (garbage or uninitialized; value logically no longer exists)

[heap from 0x158b was freed]

Only once z is dropped, too, is the heap around 0x147a freed, too.

let x: Rc<String> = Rc::new(String::from("xyz"));
let y: Rc<String> = Rc::clone(&x);
let z: Weak<String> = Rc::downgrade(&x);
drop(x);
drop(y);
drop(z);

x's stack memory:
    0x???? (garbage or uninitialized; value logically no longer exists)

y's stack memory:
    0x???? (garbage or uninitialized; value logically no longer exists)

z's stack memory:
    0x???? (garbage or uninitialized; value logically no longer exists)

[heap from 0x147a was freed]

[heap from 0x158b was freed]

Note that “garbage or uninitialized” in the above pictures means that the value actually present in memory at that place at run-time is most likely actually completely unchanged (after all, doing any changes would be unnecessary overhead). But from a Rust programmer’s point of view it’s often safer to reason about such data conservatively as “uninitialized data”.


As to why, well… the pointer the Weak points to must not be dangling; after all it’s being dereferenced when interacting with the Weak pointer in order to check things like “is the strong counter non-zero so I can still upgrade?” In order to keep less memory in-use the only alternative to the approach the standard library takes would be to allocate the counters independently from the value (i.e. the shallow data of the value). But that has its own overhead (more allocator invocations; you need to store the additional pointers somewhere, either via double indirection or by making the Rc and Weak twice at large on the stack, …), and typical use-cases don’t keep weak pointers alive for very long anyways; and for relevant use-cases you can still use Rc<Box<T>>, anyways.

7 Likes

I would like to thank both of you very much for your help and the time you spent for such nice and detailed presentation. I'm sure this can be very helpful for everyone trying to understand how these pointers actually work.

In particular this part was very interesting:

heap from 0x147a:
    0x0000 (strong counter)
    0x0001 (weak counter)
    0x???? (garbage or uninitialized; value logically no longer exists)
    0x???? (garbage or uninitialized; value logically no longer exists)
    0x???? (garbage or uninitialized; value logically no longer exists)

I didn't picture this contiguous block of memory, that is, the succession of the strong counter, followed by the weak counter followed by the actual data, in my mind, when I read about the Weak pointers for the first time in The Book, but now it makes sense.

Thanks a lot.

1 Like

By the way, FYI, the implementation of Rc, especially as far as judging how the thing is laid out in memory is concerned, is actually very straightforward to understand from the source code of the standard library (unlike some other types, e.g. complex collections like HashMap or BTreeMap). If you look here, you’ll see

pub struct Rc<T: ?Sized> {
    ptr: NonNull<RcBox<T>>,
    phantom: PhantomData<RcBox<T>>,
}

so the Rc is basically just a single pointer (the PhantomData can be ignored)

and the thing it points to is

struct RcBox<T: ?Sized> {
    strong: Cell<usize>,
    weak: Cell<usize>,
    value: T,
}

where you can clearly see the strong and weak pointer right next to the value. What this type doesn’t tell of course is, that the value behind a raw pointer (like NonNull<T>) is allowed to be invalid/unitialized or partially invalid/unitialized, so that Weak in particular (which is also having the same kind of NonNull<RcBox<T>> field) can eventually point to a RcBox where the value has already been dropped and thus, logically, de-initialized.

2 Likes

Thanks. Actually I like very much reading the source code behind Rust. Sometimes when I read the online documentation for items defined in the standard library, I also click on the Source link to see how things were actually implemented. Sometimes this is quite straightforward but sometimes there are so many dependencies and also loads of unsafe blocks that I cannot find the correct path of reading the src tree. Well, I think it takes time, study and practice more and more. Thank you for the link.

For exploring the standard library, the website https://stdrs.dev can sometimes be quite useful, which someone created to host the latest standard library docs, but with some tweaks for exploring the implementation details more easily; e. g. all private and hidden items are included in the docs. If you look at Rc there, you don't even need to go through the source at all in order to see the types of the fields of Rc as well as the definition of RcBox.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.