How are pointers made

    let mut v = Vec::<i128>::with_capacity(16);

    let var_ref = &v;
    let ptr     = v.as_ptr();
    let ptr_ref = &ptr;

    println!("var_ref {:p}", var_ref);
    println!("ptr     {:p}", ptr);
    println!("ptr_ref {:p}", ptr_ref);

    v.append(&mut Vec::from_iter(0..100000)); // cause a reallocation
    println!("new_ptr {:p}", v.as_ptr());

Each memory location is a cell divided into an address part and a value part?

  • var_ref is the print of the v variable's address on the stack?
    • Obviously, var_ref itself is on a stack address as well not yet printed?
  • ptr is the print of the value to an address on the heap? (the head of vec)
  • ptr_ref is the print of the ptr variable's address on the stack?

After the reallocation,

  • new_ptr is the print of the temp variable's v.as_ptr() value to an address on the heap? (A new address since the reallocation moved the address).

Regarding the reallocation. What makes sure that we don't lose track of Vec after a move on the heap? Is it the allocator's job to update Vec's ptrs? Where is that done at? The stack is preconfigured with yet-to-be-run/evaluated operations at compile time? But the heap move is an unpredictable runtime operation. So the stack couldn't have been preconfigured with an operation to store/fetch this new ptr data. There has to be a static unmoving location on the heap as well where ptr addresses are kept and updated and are fetched from?

1 Like

It's hard to tell what you are asking in the first part. A pointer is just like any other value in Rust, and you can take the address of a pointer, which has type pointer-to-pointer. ("Pointer" is being used in the loose, general sense – Rust has several pointer-like concrete types such as references, "raw" pointers, Box, etc. which are all distinct types but behave similarly in terms of providing indirection.)

No, not really. Each memory location is at an address, but the address itself isn't built into the values in memory. Just like your house/apartment has an address, but that's because the street by convention is divided into distinct numbers; your house doesn't intrinsically have a house number built into it.

The vector itself does that. If any mutating operation, such as pushing to the end, needs to reallocate, then the vector updates its own internal buffer pointer so that it points to the new buffer.

Sorry, I have no idea what you are asking. The vector's buffer is always on the heap. The vector's buffer pointer thus always points to the heap. When a vector re-allocates, there's no need to expand the stack. There's no need for the stack (or the heap) to know about how Vec works. Vec does the necessary internal pointer updating for itself.

This is not specific to Rust at all. You can read more about how dynamic arrays are usually implemented in basically any systems programming language.

5 Likes

Your assumptions about pointers are correct. Note that with optimizations enabled these things might change, as the optimizer will try to remove levels of indirection, inline or break apart structs, reuse stack space, etc.

Vec updates its heap pointer by itself when you call methods on it. It's Vec's job to talk to the allocator, copy data to new allocation if needed.

Vec is approximately struct Vec { data: *mut T, capacity: usize, len: usize }.

Thanks to exclusive ownership there is always only one place to update the data pointer when the Vec is mutated. You can't have references to Vec's heap while the Vec is mutated.

2 Likes

I understand there is nothing intrinsic about a memory location having an address. It's just a made-up abstraction. At the end of the day it's quantum mechanics inside microscopically molded material. The tiny silicon physical space occupied by electrons inside a RAM card doesn't have a plaque of an address. But it's emulated that way by programs.

In the first part, I am trying to figure out the distinctions of pointer data, as programs think of them. A pointer is just bits, like "everything" in a program. The bits "exist" somewhere in memory. So there is a program known distinction between data (the bits) and address (where those bits are). A pointer is also just bits of data.. and that data represents some address (numbers) somewhere. It doesn't "hold" the address. I'm just saying it's "textual" data of an address.

Now, that data of some address.. it exists on an address of it's own.. So what I was trying to clarify was if print, as I was using it, was printing the address of the pointer or the value of the pointer data (which represents another address). Was I using print accurately? It was hard for me to know with all the use of temp variables and references and as_ptr()s.

Regarding the Vec. I'm having a hard time understanding how the program does not lose track of dynamically changing addresses on the heap. Seeing as the operation steps (not the final computation/results) on the stack are known ahead of runtime.. When we define there is to be an access to a variable on the stack pointing to Vec.. There would have to be a predefined statically set operation that says "fetch vector pointer from _____???". What goes into the blank area? What does the compiler put there when it creates the binary? The blank area cannot be the statically known heap address of the Vec. Because it isn't statically known, it's runtime information. So what exactly goes there? And.. how does it get updated? You said it was Vec itself that does that? So then Vec must have a statically know bookkeeping location that starts it's life on the stack and ends it's life on the stack. But the data location it points to, is on the heap. When Vec wants to grow, it changes the location of that data on the heap, and updates the aforementioned bookkeeping data on the stack? So a Vec has two parts living in two locations, bookkeeping on the stack and data on the heap?

(sorry if this is all a bit obtuse. I'm trying to understand rust code with some type of working model of memory management. Instead of just memorized rote rules of "if I do this, then this happens")

Okay I think understand with struct Vec { data: *mut T, capacity: usize, len: usize } the data: *mut T is a value that will exist and remain on the stack so long as that Vec is being used in scope. When the Vec grows, it allocates new space on the heap, puts the new and old data in that new space, and it does some kind register operation that may or may not happen (depending if a reallocation occurred), a branch in the compiled assembly, that will update the value of that pointer? So any future reads of that pointer will point to the right place? This means that reads and writes to that pointer must happen in a very strict order?

I assume Box<T> also does this? So if I put a u8 into Box and add to that u8, underneath, the Box does the same pointer updating itself? If I put a Box<T> into another Box<T>, I'm basically duplicating this kind of heap allocation and pointer updating?

So there is no background runtime memory manager or central memory address db or anything like that? The heap memory management is all cleverly intertwined with the program's execution.

There's no separate thread of execution or anything, like there might be in a managed language with a garbage collector.

But it's all abstractions all the way down. By default[1] global allocations in Rust are done by calling into your OS via an API that looks roughly like fn alloc(size: usize, align: usize) -> *mut (); fn dealloc(ptr: *mut (), size: usize, align: usize); What does your OS do in order to service heap allocation requests? Well, the answer is that it's complicated. At some level of the abstraction stack someone's maintaining a list of what addresses are available for allocation, and mapping virtual addresses to the actual physical RAM addresses, and likely many other things, because it's abstractions all the way down.

Unless you're writing the OS, it's typically sufficient to assume the default OS allocator is good enough and treat it as a black box that spits out usable addresses when asked nicely.


  1. To customize the behavior and take a different strategy, you implement the GlobalAlloc interface and register it with #[global_allocator]. ↩︎

3 Likes

I'm basically satisfied with this level of abstraction. I know there isn't really a physical thing called the "stack" and the "heap". It's all the same main memory. The distinction is in how a program is allowed to use certain blocks of that main memory. I can infer some things with this understanding alone. Like I haven't read this anywhere, but I assume when the program starts it requests from the OS a block of memory to be reserved for the program. The only way I can imagine it doing this correctly is if it knows the maximum bytes of memory it needs and the OS gives it a starting address location to work from all the way up to the contiguously linked maximum location. I think of this as the stack. When it needs memory the program can only know about at runtime, it requests more at runtime, and the OS does exactly the same thing for it. In a sense, everything is a "heap" allocation. The stack is just the one time heap allocation that occurs at the start of the program and remains true till the end of the program.

I don't want to understand allocators or OS operations too deeply. I would just like a working model of how program managed pointers play into all that.

The Vec itself can live anywhere, and it will be on the stack if you declare it as a variable, for example. It can also live on the heap if you put it in a container that keeps its data on the heap, or it can live in static memory if you put it in a static item, or wherever, really.

The underlying buffer of the Vec always lives on the heap, and the pointer inside the Vec points to that buffer. It is this pointer that gets updated when the vector reallocates. The (pointer, length, capacity) itself doesn't move anywhere else just because the buffer (which is an independent memory area) is reallocated.

Absolutely nothing like that. It's not necessary at all. The Vec "knows" where its own pointer, length, and capacity in memory (e.g. on the stack) are, because methods on Vec that manipulate its guts are passed self (the Vec itself) as the first argument.

Here's a very naïve, over-simplified (e.g. doesn't handle ZSTs), very incomplete, and probably very inefficient implementation of a custom Vector. You can study it to see how the memory management aspect works.

4 Likes

That understanding is basically accurate! Just two asterisks to be aware of:

  • It's possible for a program to request more stack space from the OS. Since on a 64 bit machine the address space is typically much larger than the amount of memory you have access to, "heap" and "stack" addresses are typically located far apart in order to facilitate this.
  • Every time you create a new thread, this also creates a new "stack" region for the new thread. When the thread dies, its stack region is returned to the OS. Because of this, stack space isn't really any more permanent than heap; it's all just convention on how memory space is both allocated and used.

Also, stack usage is an estimate. It's not possible in general to know how much stack space an arbitrary program is going to use before executing it, so it just requests "probably enough" on startup. Because of indirection, actual memory doesn't need to be assigned to the program until it writes to the memory, so unused stack space is quite cheap.

1 Like

Ownership and borrowing ensure that when Vec writes to self.data, there is no other use of data anywhere else. &mut is not just mutable, but statically guaranteed to be exclusive access.

There's only one copy of the Vec, with only one user of the pointer at that time, so it's as simple as writing new pointer to self.data after realloc.

There's no magic there. No other process. Nothing difficult at all. It's simply updating a field of a struct, which is also just a regular struct with 3 fields.

There are no auto-updating magic pointers in Rust.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.