`Box` should probably be called `Kite`

There has been a previous discussion about this: Why is `Box` called `Box`?

From what I've read, the term "boxing" is more like "wrapping" or "packaging", because primitives behave differently from reference types in Java. But in Rust and C++, primitives are "first-class" in the sense that user-defined types typically live on the stack just like int, so we don't use Box for that. We use Box<T> in Rust because we want to move an instance of T to the heap so that the size of T does not contribute to the size of the enclosing type. I'm sure you all know this, but just as a reminder, note how a struct A can be smaller than its field of type B if we use Box:

struct A {
    b: Box<B>,
}

This is crucial in recursive data structures like:

struct List {
    value: i32,
    next: Option<Box<List>>,
}

Otherwise, it would not compile because something like std::mem::size_of::<List>() = size_of::<i32>() + size_of::<Option<List>>() would create an infinite regress.

I find it unintuitive to call this "boxing". When we box something, we expect it to be larger (or at least heavier, if compressed) than the original, right? What Rust does instead is to put the value elsewhere and just hold on to a rope attached to it, like pulling a heavy cart by a rope: you don't carry the load itself, you only hold the rope. I also imagine it as walking a dog or flying a kite, which gives more of a sense of owning the value at the other end.

Yes, ultimately Java does the same thing as Rust when boxing a value, that is, putting the value on the heap. But Java does this only for lightweight primitives, whereas Rust usually does it for heavier objects. So this gives us some sense of how Java ends up doing the same thing, but for a different purpose. Java boxes a value so that it looks no different from other objects. Rust "boxes" something so that it's not too heavy to hold.

I'm not saying we should actually rename Box to Kite, since it's far too late for that now. Just some random thoughts :face_with_monocle:

12 Likes

That is one use case.

The most common reason I use Box<_> is to hold owned unsized values (Box<dyn Trait>, Box<Path>, etc).


Anyway, interesting thoughts. I never really considered the name.

(Vec<_> versus arrays though...)

8 Likes

the term box has long history in programming. from what I read somewhere on the internet, it originated from the traditions of diagram representation in programming teaching materials, especially on printed papers and books.

for example, a heap allocated value is commonly drawn inside a rectangle (a "box"), and an arrow to it (a "pointer") from a reference on the stack. something like this:

stack:                  heap:
ptr ────▢ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚   42     β”‚   ← this rectangle is literally called a "box"
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

anecdotally, I remember Alex Stepanov once said he regreted calling a dynamic sized array a vector in the first STL implementation, but now nothing can be done to fix this "mistake". I believe he said the term "vector" was suggested by his colleague, who said it was already an established term among other communities. I don't remember the details, maybe it was common lisp, or scheme, or the like.

9 Likes

If only they had used squares or more generally kites in their diagrams... :slightly_smiling_face:

2 Likes

Yes, he got the name from Lisp. But it was a total misnomer in C++. In Lisp, a "vector" does not change length. It's a 1D array of fixed length (while a Lisp "array" can be multidimensional).

6 Likes

That's a fair point, though you could argue that "boxing" and "unboxing" could have been named differently, like "wrapping" and "unwrapping". Also, I believe it doesn't appear in the language itself, Box being related to something entirely different (Spring or JavaFX), but I've not used Java in a long time and might be wrong about that.

I think each language comes with its entourage of idioms, as new languages don't often make common features look the same as other languages. It's visible in control statements, for instance. After all, no language can claim to be the one authority on names, especially considering those names have likely been used in many contexts since programming languages existed.

There are other debatable choice of names in Rust that I'll avoid mentioning here, but I guess that's how it is. One idiom that I find very annoying because it's not just a name, is the range expression/pattern: for i in a..b is exclusive for b, which is illogical and counter-intuitive. Moreover, there are other languages like Haskell, Kotlin, and Perl where b is inclusive, so there's a real risk that new Rust programmers introduce subtle errors from old habits. But it's impossible to backtrack now, and we get used to it after a while.

At least, all the important features have been discussed and thought through so that the language is sound, and it avoided the billion dollar mistake. It's more than what other more conventional languages can claim (I'm thinking precisely of Java and the reason it needs boxing/unboxing :wink: ).

The box is the heap part. The pointer is the stack part. std::boxed::Box is both. This causes confusion like I once had when learning rust, even for experienced programmers: I don't understand Box

See? Together, they look exactly like a kite. Would look better if it's square, though.

I second this. I was amazed when I found PartialEq and Eq are two different traits in Rust.

OK. Just think of it as a Box Kite Box kite - Wikipedia :slight_smile:

It's a bit odd but no more odd than many other names of things in other languages.

I like "vector".

1 Like

Java does this only for lightweight primitives because those are the only types that can exist unboxed. C# has more value types (i.e. structs) and it also uses the "boxing" terminology to mean putting them on the heap, exactly like in Rust.

The result of boxing in Rust is overall heavier. You're still paying the cost of storing the boxed data on the heap, plus the pointer to it and eventual allocator metadata.

3 Likes

You haven't used it for 20 years??? You know very different Java, then. Java5 added autoboxing and auto-unboxing back then… and it's now mentioned prominently in the tutorial.

I'm sure that's where term leaked in Rust… Mozilla had to deal with Java, one way or another, in browser…

No, that's how things have to be. In all languages that have both the form that goes from N to N-1 wins. That was established decades ago.

I don't think anyone was even thinking twice about it: that's just how things are done in a computer science and Rust is definitely not the language for mathemticians (Julia tries to fill that niche… with some success).

Thanks god.

I hate it. It's stupid and wrong. It's also impossible to change, at this point.

I'm not so sure. the term "boxed" and "unboxed" for heap allocated (and GC managed) values have been used in functional languages even before java used them, this includes ML and Haskell. and we all know rust dates its lineage back from the ML family, specifically, OCaml has the most direct influence on the design of early rust.

also, for me personally, OCaml is the first language that I learned the term "boxed", I remember reading docs mentioning specific things like "unboxed int has 31 bits".

2 Likes

Possible. The major sticking point: by the time Rust was developed terms β€œBox” and β€œboxing/unboxing” were very well established in many, many, MANY languages.

I disagree, and famously so does Dijkstra. Half-open ranges are by far the most common use case and deserve the shortest syntax.

9 Likes

I can see the objections. Looking at "vector" with my limited vision maths spectacles on.

But what are the alternatives? "array", "dynamic_array", "dyn_aray"... all clumsy and ugly. "list" might be the thing except in CS world "list" typically means a linked list or some other complex data structure. "sequence", maybe?

All in all "vector" is close enough and sounds good and technical.

1 Like

In an ideal world we would use vector for fixed-sized vectors, like in math (and lisp). And then array would be left for dynamically-sized array.

Would have worked fine… except now we ended up with the exact opposite and trying to flip the world around would just cause even more confusion.

It confuses every newbie to the programming. But after more then three decades of use… it's too late to change it.

2 Likes

I admit that for a while, when learning Rust, I was quite confused with what Box<T> meant. I thought it was just a newtype wrapper. In retrospect it isn't confusing, I guess I just wasn't used to this notation since the word "Box" doesn't directly indicate there is an indirection, so something like HeapPtr<T> would have been easier to understand. But Box is fine I think.

It could have just been Array<T>, similarly how String is used for dynamically growable strings. Or List<T> (like in Python).

I don't know what you mean by "technical". Array, List, DynArray are all at least as technically correct, and arguably more technically correct than Vec. Do you perhaps mean "cryptic", "mysterious"? That's not a desirable property for names.

I agree half-open ranges are clearly better than closed ranges, and personally think .. is fine in Rust, but I think the issue being raised here is that traditionally in math the ... notation typically means inclusive e.g. i = 1, ..., n (which is somewhat unfortunate). Similarly the βˆ‘ sum notation in math is normally used with inclusive start and end indices (which is also unfortunate).

1 Like

Yes. But arrays in my world (C, C++, Pascal, Coral, PL/M...) are fixed size. Meanwhile "List" traditionally is some kind of linked list data structure. Things like "DynArray" bring us to the ugliness in naming of most things in C++.

Not exactly. It has an air of precision.

Who says a vector cannot be of variable length anyway? Who says the number of dimensions cannot be variable. In many calcs I have seen things get migrated from 2d to 3d to 4d. as things proceed.

Seems I'm the only one that not confused by "Vector" as a noob to C++.

1 Like

This may be a true statement about Scheme, but it is false about Common Lisp. (Scheme may arguably be the most noteworthy Lisp today, but from a historical perspective, saying Lisp and describing Scheme is like saying β€œC family languages” and describing only Java.) vector is defined as the subtype of array containing all one-dimensional arrays. Resizability is complicated but orthogonal to whether an array is a vector or not, and there is a push operation on vectors.

I am not very broadly informed beyond that, but I would expect that all Lisp dialects use β€œvector” to refer to some kind of collection with dense (in Rust terms, slice) storage rather than a linked list.

3 Likes

agreed. traditions do affect what "feels" natural for different people.

another famous example is array indexing. mathematicians think array indices as subscripts or ordinals, so they think x[1], x[2], x[n], and y[1][1], y[m][n] are "natural", while computer scientists think indicies as offsets, so they think x[0], x[1], x[n-1] and y[0][0], y[m-1][n-1] are "natural".

so we have some languages with 0-based indexing, like C, Java, Python, Rust, etc; and we have other languages with 1-based indexing, like R, Matlab, Wolfram, Lua, etc; and interestingly, we also have languages that allows the user to declare the index range for arrays, such as Fortran, Pascal. finally, we have the odd one: Erlang, where tuples are indexed starting from 1, while arrays are indexed starting from 0.

2 Likes

And in Rust ... with three dots did use to mean inclusive. But it was considered too easy to confuse with .. so we got ..=. I guess the least confusing way would be to just have ..< and ..=. (At least RustRover actually renders a courtesy < after .. by default. Swift OTOH has ..< and ... which feels like a missed chance to make it right.)

3 Likes