I don't understand Box

First of all I want to say that this is probably the single best tech forum I've ever participated in as far as signal/noise ratio. You guys are great in answering questions, even stupid ones, thoroughly, quickly, and with no snobbery.

Now, how is it that I did C++ for 15 years and don't understand the point/need for boxing? (Is it because of that?)

What I don't get is that Rust tracks ownership and when an owned value goes out of scope it is freed. Shouldn't this mechanism be all that's needed, without needing to Box something?

Let's say I have a struct, call it A, an instance of which, call it a, is created by main() by calling A::new().

A in turn holds HashMap value, call it h, which is created using HashMap::new() in A::new().

Should either a or a.h be wrapped in a Box?

5 Likes

In a nutshell Box<T> is std::unique_ptr<T> from C++11. Usually Box is better than unique_ptr on performance/usability/catching bugs due to different language guarantees, but none of them affect fundamental use cases.

6 Likes

Box is semantically equal to a std::unique_ptr that can't be a nullptr.

If above sentence doesn't make you feel answered:

  • Box is required for recursive type.
// For example, you can't replace `Box<Foo>` with `Foo` here.
struct Foo {
   next: Option<Box<Foo>>,
}
  • Box is required to store dynamic sized type. For example, you can't store dyn SomeTrait, but you can store Box<dyn SomeTrait>.

  • Box reduces the size of complex struct. Carrying around a Box<[u8; 1000]> is definitely cheaper than a [u8; 1000].

12 Likes

I should've mentioned that my C++ experience ended 20 years ago. I don't recall using smart pointers much (they were new).

Can you please dumb it down a bit and tell me

Should either a or a.h be wrapped in a Box?

And why?

1 Like

Boxing a value has a couple different uses.

The simplest one is when you have a really big struct you can get significantly better performance in certain scenarios by boxing it. Boxing a value allocates space for it with the configured allocator, so when you move a Box around you're only moving around a pointer instead of the whole huge struct. You can get smaller code size this way, and sometimes it's actually faster even accounting for the fact that you're allocating memory.

Another reason you might use a box is for unsized types. Unsized types in Rust need to be behind some kind of indirection. Often you can use a reference, but if you need an owned trait objects or slice, the simplest option is to box it.

8 Likes

See my edits above.

And short answer to your question: No, you don't Box unless you need to. If you are not sure, then you don't need it.

5 Likes

There's obviously space for nuance here, but as a general parallel:

Use Box<Foo> in Rust where you would have used new Foo() in C++.

So certainly most of the time you don't, because you'd just use Foo without newing one. But the same kinds of reasons you'd heap-allocate in C++ you'd heap-allocate in Rust too.

(Box just makes it easier to get the deallocation part right, rather than needing to remember which kinds of pointers are for what.)

5 Likes

Apart from what others have mentioned, another important use case for Box is for defining recursive data types. Consider a (very simplified) definition of a regular expression's AST:

enum Regex {
    Char(char),
    Concat(Vec<Regex>),
    Alt(Vec<Regex>),
    ZeroOrMore(Box<Regex>),
    OneOrMore(Box<Regex>),
    ZeroOrOne(Box<Regex>),
}

The Concat and Alt cases are "automatically" boxed by virtue of using a Vec, but the ZeroOrMore (and similar) cases only have one child regex expression. You need some kind of indirection here to store the child expression because the child expression can itself contain another, e.g., ZeroOrOne.

If you just used ZeroOrMore(Regex) (try it!), then what would be the size of Regex in memory? Without boxing it (in some fashion), its size would be infinite. Indeed, the compiler will reject it:

error[E0072]: recursive type `Regex` has infinite size
 --> src/main.rs:3:1
  |
3 | enum Regex {
  | ^^^^^^^^^^
...
7 |     ZeroOrMore(Regex),
  |                ----- recursive without indirection
  |
help: insert some indirection (e.g., a `Box`, `Rc`, or `&`) to break the cycle
  |
7 |     ZeroOrMore(Box<Regex>),
  |                ++++     +

For more information about this error, try `rustc --explain E0072`.
error: could not compile `playground` due to previous error

(OK, I now see someone mentioned recursive types above. I missed that on my first read through, but I'll leave this post here as it shows another example.)

15 Likes

And to add on to that, you can think of Box<T> as a memory safe abstraction for malloc() if that makes you feel more comfortable. Most constructors for structs allocate on the stack. If you want the value (or part of it) on the heap instead, as you would with malloc() (or as mentioned by others, as you would with new T()), then box it.

4 Likes

The recursive case is interesting. You could use references with heap allocation for the child Regex to solve the compile problem as well, yes? But that would be much work than using Box.

For that, referenced Regexes must be stored somewhere else - be it another Box, Vec, or something like typed arena. If you need a "reference with allocation" meaning that the reference will manage this allocation, well, that's exactly the definition of Box.

4 Likes

What, concretely, do you mean when you say this?

&Foo isn't owning. "Reference" in the Java sense is boxing.

1 Like

And what else exactly do you mean by "reference with heap allocation", if not Box? Since references are not owning, what would own the referenced value?

What you are describing is Box.

2 Likes

You are right. I was further confused, by the use of constructors named "new."

I have

let mut p: HashMap<Key, Position> = HashMap::new()

p is a value, not a reference! There's actually nothing on the heap here!

I think I have it now.

2 Likes

Indeed, Rust doesn't have constructors. new() is just a regular function (a static method in C++ terms) that in this case returns a struct by value, which is stored on the stack.

3 Likes

Presumably the HashMap struct is on the stack but anything you insert into the HashMap is on the heap.

3 Likes

You are confusing heap allocation with indirection. Indirection doesn't imply heap allocation! It never did, not even in C and C++.

The following is perfectly valid C code:

int value = 42;
int *ptr = &value;

and so is the following:

int *ptr = malloc(sizeof(*ptr));
*ptr = 42;

The first one doesn't have any "heap" (dynamic) allocation, the second one does, yet both use pointers. The same is true in Rust. You can have references to any kind of memory, it doesn't matter.

In your example, the hash map is a value, it lives on the stack by default (as its place is a local binding), but the buffer of entries it manages is on the heap. (Otherwise it would be hard to make it growable and big enough for practical use…)

7 Likes

Look, as I said, my C++ experience was 20+ years ago. Since then it's been nothing but JVM languages.

I understand the difference between the stack and the heap perfectly well. It's Rust syntax and behavior I'm learning.

1 Like

I wonder, is it really necessary to understand heap and stack in order to understand Box (from a high level perspective)? I would say that a Box<T> simply wraps a type T in order to

  • reduce the size of Box<T> to a fixed (small) amount (compared to T which may be bigger or !Sized)
  • at the cost of adding a bit of runtime overhead when accessing the value through the box.
5 Likes

Agreed. I wonder if talking about the difference leads to more confusion rather than less. After all, it's never that "I want it on the heap" is my actual goal -- the goal is always something like "I want the return value to be smaller" or "I want to type-erase values" or "I want the address of the value to be stable despite moving the ownership" or ...

8 Likes