Why doesn't rust just heap allocate automatically?

It seems to me like the compiler is pretty good at detecting when a value has an unknown size, and thus needs to be heap allocated, so why do we need the Box type? It seems to me like it could just heap allocate(wrap in a box internally) a type whenever it detects it doesn't implement Sized, and maybe give a compile warning that the value is heap-allocated so that you don't do it by mistake. So what were the reasons the community chose for the Box type?

1 Like

Automatic heap allocations are against the goals of Rust - for a systems language, that's pretty much unacceptable, warnings or not. Box also represents unique ownership over heap allocated storage, as a type, so it serves that role. How would you represent that without a type?

Is there a more specific question/issue hiding behind your Box question?

11 Likes

Box is not the only, or even the most common, way to pass a pointer to a dynamically sized type. For example, types like str and [T] are usually passed as borrowed references like &str and &[T], or through specialized owned types like String and Vec<T>. Trait objects can be passed as &Trait or Rc<Trait> or others. Borrowed references like &Trait do not require a heap allocation.

10 Likes

What are you expecting programmers to do with this warning, just ignore it? Warnings for standard, correct code are not helpful.

8 Likes

How would you represent that without a type?

It wouldn't need to be specifically represented, you would just write:

struct A {
    arr: [T]
}

Instead of:

struct A {
    arr: Box<[T]>
}

Same for function signatures, and all usage of the value would just remain the same, since Box already implements Deref.

The first definition would give a warning(for example: '[T] does not implement Sized but A(it's parent) does, so [T] will be allocated in the heap, use #[allow(heap_alloc)] to disable this warning'.

Instead of worrying about boxing values or not, you could write the first draft of a program with #[allow(heap_alloc)] on, and then, once it works, enable the warnings and un-heap allocate wherever applicable.

Is there a more specific question/issue hiding behind your Box question?

No, I was just wondering, because sometimes boxing values can get a bit tiresome.

What are you expecting programmers to do with this warning, just ignore it? Warnings for standard, correct code are not helpful.

When applicable, it could be ignored or suppressed, or the code could be changed to be generic. Most of the time, using Box is a convenience, and code that utilizes this feature can be changed so that heap-allocation is not necessary.

The example you have is for a slice, which is something specific. How would you put a scalar value on the heap?

Hopefully you're not boxing all that much in Rust :slight_smile:.

2 Likes

The example you have is for a slice, which is something specific. How would you put a scalar value on the heap?

Why would you want to put a scalar value on the heap? If something can go on the stack, it should, in general.

If, for some reason, you need to heap-allocate an f32 when it could also be stack-allocated, you can use unsafe functions like the ones the implementation of Vec uses.

Hopefully you’re not boxing all that much in Rust :slight_smile:.

Once in a while, the real reason it becomes tedious is that I feel like there could be a better way, even if it's not actually that much extra code. In any case, I'm better off trying to understand the reasoning behind the extra code than shrugging it off as "that's just how they made it". You brought up a valid point in your first comment, automatic heap-allocation is indeed unheard of in most systems programming languages, but as long as I'm aware the heap allocation occurs, whether it's explicitly written in the code or communicated through a warning doesn't matter to me.

1 Like

By scalar I meant a single element (non slice). For instance, your struct A itself counts as that. So does any arbitrary non-slice type.

You put it on the heap for a few reasons. For example, you might want an owned trait object (rather than a fat pointer). Or you may want to keep size of an enum down (eg one variant is very large - put it on the heap instead).

For a non-scalar example, you may want to put a large array on the heap so that you don't blow the stack.

2 Likes

There's another use of Box that we've not mentioned yet, which is to pass/receive raw pointers via FFI. You can get a raw pointer to Rust-allocated memory from a Box, pass that pointer via FFI, and then take ownership back when FFI returns a raw pointer back (i.e. put it back into a Box). This is an example of where Box is useful outside "pure" Rust domains, although the underlying concept is the same (unique ownership over a heap allocation).

5 Likes

This, I think, is probably going to be where you will find the most resistance to your point of view. There is very good reason to care about this distinction!

Warnings are typically read once, if ever: the first time a translation unit successfully compiles. Prior to that (i.e. while attempting to get the unit to compile), the focus must be on hard errors, so warnings are often ignored. After that, the build system no longer has any reason to recompile the unit, so warnings are not displayed on subsequent builds. (Reprinting all relevant warnings is actually something I'm hoping Cargo may be able to do eventually.) If the project uses CI, and warnings are permitted in the integration branch, then the new warnings will be printed alongside hundreds or thousands of others in a log file that is very unlikely to be read. In the best case scenario, a tool is installed to automatically grep the log file to track the total number of warnings in the project over time.

But quickly reviewing warnings to determine if they're acceptable is not actually a useful time to review heap allocations. The time to think about heap vs local allocation is when designing, writing, and reading the code--which is exactly when explicit language elements come into play.

I mentioned reading code. Early in my programming education, someone told me that a programmer's primary audience is not the compiler but other programmers. This was an enormously valuable insight, and I wish it were more commonly spread and adopted. The reason languages like Python and Rust are explicit is to ensure that the code can be understood simply by reading it. If an automatic boxing scheme were adopted, unless the reader has memorized the rule for automatic boxing (!), they wouldn't know whether something is boxed or not except by trying to compile the code and read the warnings (!!). Even if the rule is very simple, it is an unnecessary mental tax, and since understanding other people's code is difficult to begin with, language designers strive to eliminate such taxes.

So, explicit boxing it is.

7 Likes

Note, too, that making types the determinant of whether something is heap allocated or not is somewhat surprising behavior, at least to me. The closest thing I can think of is C#'s struct vs class distinction, and even there, the difference is explicit in the language.

What about writing an interpreter for a language like Python in which everything (including numbers) is boxed?

Ok, I get it, the Box type definitely makes it more obvious what code does, and is probably more important than enabling programmers to write code fast (that would be the job of IDE's and text editors).

What about writing an interpreter for a language like Python in which everything (including numbers) is boxed?

All values are heap-allocated in Python because they can't go on the stack, given the requirements of the language. You could allocate them on a virtual heap allocated on the stack, but that would be pretty pointless (and they would still, in a way, be allocated on the/a heap). But you are right, this means there are situations where a scalar value has to be on the heap (another example is Vec).

Exactly - and low level languages like Rust should be capable of implementing interpreters for such languages, which requires the ability to box primitive values at will.

The claim that Python does not have a stack is patently false. If Python code has no stack, why is it trivial to bag yourself a neat little stack overflow error by defining a recursive function that simply calls itself ad infinitum?

On top of that, what's the point of heap-allocating e.g. an integer? That is typically something you want on the stack, at least if you care even a little bit about non-trivial software running smoothly. Integers are just an example, there are of course other types.

1 Like

You are right of course, but for all the faults my suggestion had, inability allocate values on the heap when they need to be heap-allocated was not one of them, the only difference would have been that the compiler would do this automatically whenever it was able to(which, in the case of using Box::new() and Box<_>, is pretty much always).

Fair enough--but you suggested using unsafe methods to force the allocation!

Certainly! I don't believe anyone was making that claim, though.

I have not personally tried to implement a Python interpreter, so I don't know exactly why this decision was made in CPython. One reason might be to facilitate keeping the distinction opaque to the user between integers that fit into the platform's "word" size and arbitrarily-sized integers (which can take up an arbitrary amount of space). Another reason might be to facilitate implementing the id function.

1 Like

Fair enough–but you suggested using unsafe methods to force the allocation!

Yes, if you have some working code that allocates an f32 in the stack, but for some reason, you want to put it on the heap, it seemed reasonable to me that that would be done through unsafe methods(or some special function) as that would be a pretty uncommon requirement. However, @vitalyd pointed out that it is possible someone might want to keep the size of an enum down by putting it(or it's data?) on the heap, so that requirement probably isn't as uncommon as I thought it would be. In any case, the argument about readability and this enum thing have convinced me that automatic boxing is probably not a good idea for the Rust language. But if it is logically sound, it could become some feature of a future IDE.

You'd put its data on the heap. For example:

enum MyEnum {
   Int(i32),
   F64(f64),
   Str(String),
   //Big(SomeBigStruct) // instead of this, we put on the heap
   Big(Box<SomeBigStruct>)
}

struct SomeBigStruct { payload: [i32; 1000*1000] } // an exaggeration but illustrates the point

If MyEnum is copied around a lot (or embedded inside other types), performance would tank.

2 Likes

I would much rather deal with seeing Box<T> than to have to read a bunch of code with

which seems more verbose than box and less clear, or wonder if we are just ignoring warnings in some scenarios. Additionally, it seems to make sense that Box would follow the same convention as RefCell,Rc, and Arc.

EDIT: Haha, just realized this is a pretty old post. I wonder why Google recommend it to me in my news feed. Sorry!