Recursive types - confused on book paragraph

Here it says:

A value of a recursive type can have another value of the same type as part of itself. Recursive types pose an issue because Rust needs to know at compile time how much space a type takes up. However, the nesting of values of recursive types could theoretically continue infinitely, so Rust can’t know how much space the value needs. Because boxes have a known size, we can enable recursive types by inserting a box in the recursive type definition.

And the discussion hovers around:

enum List { // can't assign a size
    Cons(i32, List),
    Nil,
}

However, so can this structure suggested as solution:

enum List { // can't either, but it's on the heap.
    // size it takes on stack _is_ defined now.
    Cons(i32, Box<List>),
    Nil,
}

My reading of it, is something like: Rust needs to know at compile time how much space anything that will be on the stack takes; which means Rust will only look up to the first Box and then "disengage" (doesn't care whether it's recursive now.)

In other words, the problem with recursive types appears as long as it's supposed to be stack-allocated. So Box fixes it.

Its size is still unknown.

Or is there something else I should be thinking instead?

1 Like

The point is, that a struct type A containing itself as a subtype can not work, as it would be an infinite recursion. A Box in Rust is what is a plain pointer in C, with fixed known size -- 4 or 8 bytes typically on a 32 or 64 bit OS. So a Rust struct A can contain a Box<A> as a field. The explanation in the official book is a bit difficult to understand if you know a language like C, but might be really hard when you know no other systems programming languages. I hope that I have explained this topic a bit better in my own book, you might compare it, and tell me if I have to further improve the explanation.

And for your mention of the Stack: No, it is not really Stack-related. It is more about an indirection. For a struct type A containing itself as a field, it will never work, even when you try to allocate the outer struct already on the heap. A bag can not contain the same bag. But containing some kind of reference or pointer is possible. The official book made it not really clear, that a Box<T> is actually a plain pointer.

1 Like

You are thinking correctly. The Sized trait is used to distinguish values that have a known size and can be used as variables and parameters. And Box is Sized because it only contains a fixed sized pointer.

Sorry, it is more a fundamental, logic problem. The official book refers to the Stack vs. Heap to often :slight_smile: But OK, to create the Box in Rust, or a valid pointer in C, we would typically do a heap allocation.

And Box is Sized because it only contains a fixed sized pointer.

It is a pointer! There is no other metadata.

Or more formally:

Box<T> is the most basic smart pointer, providing ownership of data allocated on the heap. Conceptually, Box<T> is a simple struct that holds a raw pointer to heap-allocated data of type T.

I could be wrong but I don't think that's entirely true?

For Box<[T]>, doesn't it store the length of the slice as well?

PS: actually, Box specifically does not seem to from testing, but the &[..] does iiuc.

1 Like

I am quite sure it is true. When I read that section in October 2023, I did some Google and stack overflow search, as I was not too happy how Box<T> was introduced by the official book. Unfortunately I can currently not remember what Jim Blandy wrote about it in his book.

Of course, you could argue, that a struct containing just a pointer is still different from a plain pointer. It is like the NewType pattern, a struct containing just an int is basically just an int, the compiler can optimize it. I would just want to ensure, that people do not think a Rust box is a big entity with a lot of magic -- it is not more than a plain pointer. Some of Rust other smart pointers are more complicated.

For Box<[T]>, doesn't it store the length of the slice as well?

Rusts box type depends not on T! A dependency would make no sense. But I agree, a slice is a fat pointer, containing the pointer to the data and the length. So I would assume that a box containing a slice would be a pointer to a fat pointer? At least in principle -- the compiler might optimize it?

However, there is something closer to the question at hand that I don't quite agree on, but may be wrong as well.

Variables are stored in the stack. The stack needs Sized values. But for a reason, I think, which is that the stack can not allow for data to grow, at least not in Rust. Only to be mutated. I was also explained something related short ago.

So I argue that Box solves not only because Box is Sized —in the end, we till have an unsized structure overall and may be infinite. However, "Unsized" is allowed on the heap, and even though the data type as a whole has no fixed size (is not Sized), it now allows it.

I think @jumpnbrownweasel agrees (to an extent?) but I'm why this isn't sound to you? I don't quite see it as being a "logical" or different problem than that.

Sadly, I can not put it in more accurate words right now.

The point is, that an object, that contains itself, can not exist, as that would lead to a never ending recursion. You would never be able to store it somewhere, as it is in principle infinite in size. You can argue with stack and heap, but I would prefer to explain it as a abstract logical problem.

At least it become easy to understand, when you know that a box is just a pointer, as it is never a problem to have a struct with a field of Box (Pointer) type.

Isn't that infinite as well? The Pointer goes to a List, that goes to a Pointer...; the list still contains itself in terms of the type.

Yes. In first year at university we had to create linked lists in Pascal. It took me some weeks to understand it. The list type had a next field, which was a pointer to one more list element, or that pointer had the special value NIL (NULL in C) indicating there is no next element. So such a list could be infinite in in principle, but NIL indicates its end. The point is the indirection by use of pointers. For example, a struct containing two ints, with a field containing that exact same type. That would be a big recursive blub. The pointer indirection allows to have not one big allocation, but a chain of many, and an end indication. Unfortunately, with the Rust box, and the Cons example, it is a bit more difficult. I am sure, without the decades using other languages, I had no change to understand it for Rust. But I am not that smart.

1 Like

Actually I think you are right @StefanSalewski because this does compile:

fn main() {
    enum List<'a> {
        Cons(i32, &'a List<'a>),
        Nil,
    }
}

and it's not on the heap, at least not necessarily.

So it's about what is considered a "unit" or something similar. In the case of direct recursion the unit goes on forever.

With indirection what goes forever is like a chain of units. Somehow that's accepted.

Although I may be saying nonsense by now (or earlier.)

It's just about what is allocated as a single contiguous piece of memory - that's all the compiler needs to know for a value to be Sized. The size of a value never includes any memory that isn't actually part of the value itself, no matter what kind of pointer/reference/other indirection is being used.

If we have:

struct Foo {
   x: Box<[u8; 100000]>,
}

then we, as humans, know that the Box uniquely owns the thing it points to (because that's what a Box does), and so creating an instance of this struct is going to use a little bit more than 100000 bytes of memory in total, but that's not relevant to the compiler when it's determining the size - the compiler only needs to know how much space the actual Foo struct takes up, which is always the same: the size of one pointer.

2 Likes

Its, how to say... "reachable runtime memory capacity" is unknown at compile time, naturally. Its size in terms of what the compiler cares about -- how much inline space is required, the number returned from size_of -- that is known.

Incidentally, "capacity of all the memory I own at runtime" is approximately impossible to define in a general way. Like, in a way where some analysis gives you an answer that every programmer would agree was accurate or useful.

The problem with the OP isn't that the data structure was not Sized. The problem was it had infinite size. "Not Sized" usually means "dynamically sized", "has a size known at runtime".

Additionally, while it is true true that variables must be Sized, I suggest not thinking of what's under discussion here as "stack vs. heap" specifically. There's no such thing as a "stack data-type". Like, the book[1] says:

A String is made up of three parts, shown on the left: a pointer to the memory that holds the contents of the string, a length, and a capacity. This group of data is stored on the stack.

But if you have a HashMap<String, String> those are not "stored on the stack". They are stored in the heap. A String consists of those three things inline.

That being said, there are operations -- like moves, returning values -- that require the type to be Sized.

Unsized is allowed on the stack too!

Box<_> guarantees anything[2] it points to is on a heap, but that's just a library-level promise.

But again, the problem with the OP isn't about unsized types, it's about infinitely sized types. The List that compiles is not unsized! It always has a fixed size -- is Sized -- as is defined by the language / compiler. When we're talking about Sized, size_of, things the compiler reasons about... we are not talking about "reachable runtime memory capacity".

Or to take the recursion out of it: a Vec<T> is always a pointer, a length, and a capacity, and always has the size of three usizes. It manages a heap allocation at run time (dynamically), but that doesn't mean the Vec<T> type changes size, that values of type Vec<T> change size, that Vec<T> itself is dynamically sized.

The compiling List is always also the same fixed size. It may also manage a heap allocation when the value has the Box<_> containing element.

And indeed, you could build this list up entirely on the stack instead. It's still fine due to the indirection. It still has a finite, fixed size.

The compiler didn't look at the Box and go "ah, using the heap, okay then". The core language doesn't even have a concept of "the heap". The fix in both cases was that compiler was able to calculate a finite size.

The list potentially points to another list at run time.


  1. which heavily overemphasizes stack vs. heap ↩︎

  2. non-zero-sized ↩︎

2 Likes

There's no subtyping going on. I guess you meant as a field?

Box<T> is a pointer with no metadata if T: Sized.

Box<T> is a pointer and some pointer-sized metadata if T is not Sized.

Example.

The size of the Box<T> type, and what it consists of, most definitely does depend on T.

Same with any other unsizing coercion supporting pointers.

No, it is itself a wide pointer. What you're describing would be a Box<Box<[T]>> and need two allocations.

There's a concept of a ThinBox that puts the metadata on the heap beside the unsized value, but that's not the same as a pointer to a wide pointer either.

The size I refer to is not runtime size though, but type size. I think if you dont assume that compiler looks at contiguous memory, you reach that conclusion. Thats why @tornewuff's answer was quite helpful for me.

Maybe I mixed them up somewhere, but that was the idea. And later conceptualised as nesting vs chaining.

This is not quite true, a box is a fat pointer if it owns dynamically sized data, like a slice.

fn main() {
    use std::boxed::Box;
    println!("{}", size_of::<Box<i32>>());
    println!("{}", size_of::<Box<[i32]>>());
}

The first box has a size of eight bytes, and is just a pointer. The second box is a fat pointer with a size of 16 bytes. It stores the address and the length.

1 Like

I tried to run this while testing and failed.

Thanks! This is directly runnable Rust Playground

Yes, indeed.

There's a concept of a ThinBox that puts the metadata on the heap beside the unsized value, but that's not the same as a pointer to a wide pointer either.

This is not quite true, a box is a fat pointer if it owns dynamically sized data, like a slice.

OK, thanks for that information. Rust is really not that easy.

1 Like

Drawing things out may help conceptualize things, if still needed.

 List (::Cons)         List (::Cons)         List (::Nil)
+-----+--------+      +-----+--------+      +-----+--------+
| 13  | 0xdf90 | ---> | 42  | 0x3e01 | ---> | 17  | 0x0000 |
+-----+--------+      +-----+--------+      +-----+--------+

Layout not guaranteed, but all three values have the same size -- the size of the List type.

1 Like

You're right. I jumped into talking about the Sized trait to quickly without saying more about the specific problem. Sorry for the confusion.