First of all I want to say that this is probably the single best tech forum I've ever participated in as far as signal/noise ratio. You guys are great in answering questions, even stupid ones, thoroughly, quickly, and with no snobbery.
Now, how is it that I did C++ for 15 years and don't understand the point/need for boxing? (Is it because of that?)
What I don't get is that Rust tracks ownership and when an owned value goes out of scope it is freed. Shouldn't this mechanism be all that's needed, without needing to Box something?
Let's say I have a struct, call it A, an instance of which, call it a, is created by main() by calling A::new().
A in turn holds HashMap value, call it h, which is created using HashMap::new() in A::new().
In a nutshell Box<T> is std::unique_ptr<T> from C++11. Usually Box is better than unique_ptr on performance/usability/catching bugs due to different language guarantees, but none of them affect fundamental use cases.
The simplest one is when you have a really big struct you can get significantly better performance in certain scenarios by boxing it. Boxing a value allocates space for it with the configured allocator, so when you move a Box around you're only moving around a pointer instead of the whole huge struct. You can get smaller code size this way, and sometimes it's actually faster even accounting for the fact that you're allocating memory.
Another reason you might use a box is for unsized types. Unsized types in Rust need to be behind some kind of indirection. Often you can use a reference, but if you need an owned trait objects or slice, the simplest option is to box it.
There's obviously space for nuance here, but as a general parallel:
Use Box<Foo> in Rust where you would have used new Foo() in C++.
So certainly most of the time you don't, because you'd just use Foo without newing one. But the same kinds of reasons you'd heap-allocate in C++ you'd heap-allocate in Rust too.
(Box just makes it easier to get the deallocation part right, rather than needing to remember which kinds of pointers are for what.)
Apart from what others have mentioned, another important use case for Box is for defining recursive data types. Consider a (very simplified) definition of a regular expression's AST:
The Concat and Alt cases are "automatically" boxed by virtue of using a Vec, but the ZeroOrMore (and similar) cases only have one child regex expression. You need some kind of indirection here to store the child expression because the child expression can itself contain another, e.g., ZeroOrOne.
If you just used ZeroOrMore(Regex) (try it!), then what would be the size of Regex in memory? Without boxing it (in some fashion), its size would be infinite. Indeed, the compiler will reject it:
error[E0072]: recursive type `Regex` has infinite size
--> src/main.rs:3:1
|
3 | enum Regex {
| ^^^^^^^^^^
...
7 | ZeroOrMore(Regex),
| ----- recursive without indirection
|
help: insert some indirection (e.g., a `Box`, `Rc`, or `&`) to break the cycle
|
7 | ZeroOrMore(Box<Regex>),
| ++++ +
For more information about this error, try `rustc --explain E0072`.
error: could not compile `playground` due to previous error
(OK, I now see someone mentioned recursive types above. I missed that on my first read through, but I'll leave this post here as it shows another example.)
And to add on to that, you can think of Box<T> as a memory safe abstraction for malloc() if that makes you feel more comfortable. Most constructors for structs allocate on the stack. If you want the value (or part of it) on the heap instead, as you would with malloc() (or as mentioned by others, as you would with new T()), then box it.
The recursive case is interesting. You could use references with heap allocation for the child Regex to solve the compile problem as well, yes? But that would be much work than using Box.
For that, referenced Regexes must be stored somewhere else - be it another Box, Vec, or something like typed arena. If you need a "reference with allocation" meaning that the reference will manage this allocation, well, that's exactly the definition of Box.
And what else exactly do you mean by "reference with heap allocation", if not Box? Since references are not owning, what would own the referenced value?
Indeed, Rust doesn't have constructors. new() is just a regular function (a static method in C++ terms) that in this case returns a struct by value, which is stored on the stack.
You are confusing heap allocation with indirection. Indirection doesn't imply heap allocation! It never did, not even in C and C++.
The following is perfectly valid C code:
int value = 42;
int *ptr = &value;
and so is the following:
int *ptr = malloc(sizeof(*ptr));
*ptr = 42;
The first one doesn't have any "heap" (dynamic) allocation, the second one does, yet both use pointers. The same is true in Rust. You can have references to any kind of memory, it doesn't matter.
In your example, the hash map is a value, it lives on the stack by default (as its place is a local binding), but the buffer of entries it manages is on the heap. (Otherwise it would be hard to make it growable and big enough for practical use…)
I wonder, is it really necessary to understand heap and stack in order to understand Box (from a high level perspective)? I would say that a Box<T> simply wraps a type T in order to
reduce the size of Box<T> to a fixed (small) amount (compared to T which may be bigger or !Sized)
at the cost of adding a bit of runtime overhead when accessing the value through the box.
Agreed. I wonder if talking about the difference leads to more confusion rather than less. After all, it's never that "I want it on the heap" is my actual goal -- the goal is always something like "I want the return value to be smaller" or "I want to type-erase values" or "I want the address of the value to be stable despite moving the ownership" or ...