Why is `Box` called `Box`?

I guess it’s possible to explain in terms of Rust syntax. Suppose there were no traits, then dictionary passing is a way to simulate certain aspects of what traits allow you to do.

E.g. looking at a simplified Iterator trait,

pub trait Iterator<Item> {
    fn next(this: &mut Self) -> Option<Item>;
    fn size_hint(this: &Self) -> usize;
}

and let’s use free-standing functions for maximal functional programming experience

fn next<Self_: Iterator<Item>, Item>(this: &mut Self_) -> Option<Item> {
    Iterator::next(this)
}

fn size_hint<Self_: Iterator<Item>, Item>(this: &Self_) -> usize {
    Iterator::size_hint(this)
}

then here’s a generic function that uses it

fn second<T, I: Iterator<T>>(mut i: I) -> Option<T> {
    let _ = next(&mut i)?;
    next(&mut i)
}

Then we could “simulate” traits in the following manner. Define a struct instead of a trait:

pub struct IteratorDict<Self_, Item> {
    pub next: fn(&mut Self_) -> Option<Item>,
    pub size_hint: fn(&Self_) -> usize,
}

fn next<Self_, Item>(dict: &IteratorDict<Self_, Item>, this: &mut Self_) -> Option<Item> {
    (dict.next)(this)
}

fn size_hint<Self_: Iterator<Item>, Item>(dict: &IteratorDict<Self_, Item>, this: &Self_) -> usize {
    (dict.size_hint)(this)
}

Here, the free-standing functions now expect an extra argument of type &IteratorDict<Self_, Item>, which replaces the Self_ Iterator<Item> bound that was previously in place. In the same way, we modify the second function:

fn second<T, I>(dict: &IteratorDict<I, T>, mut i: I) -> Option<T> {
    let _ = next(dict, &mut i)?;
    next(dict, &mut i)
}

This function now passes the dictionary argument explicitly to the calls to next.


In this frameworks, what are trait implementations? They’re basically constants:

Taking

type StaticChars = std::str::Chars<'static>;

impl Iterator<char> for StaticChars {
    fn next(this: &mut Self) -> Option<char> {
        std::iter::Iterator::next(this)
    }
    fn size_hint(this: &Self) -> usize {
        0 // yeah, we don’t really know a good size
    }
}

this translates to

const IMPL_ITERATOR_CHAR_FOR_STATIC_CHARS: IteratorDict<StaticChars, char> = IteratorDict {
    next: std::iter::Iterator::next,
    size_hint: |this| 0,
};

Then a non-generic function that calls next

fn foo(mut x: StaticChars) {
    next(&mut x);
}

uses this dictionary

fn foo(mut x: StaticChars) {
    next(&IMPL_ITERATOR_CHAR_FOR_STATIC_CHARS, &mut x);
}

Finally, a generic implementation is a generic function:

impl<T> Iterator<T> for std::vec::IntoIter<T> {
    fn next(this: &mut Self) -> Option<T> {
        std::iter::Iterator::next(this)
    }
    fn size_hint(this: &Self) -> usize {
        this.len()
    }
}

becomes

fn impl_iterator_for_into_iter<T>() -> IteratorDict<std::vec::IntoIter<T>, T> {
    IteratorDict {
        next: std::iter::Iterator::next,
        size_hint: |this| this.len(),
    }
}

And you can probably imagine how to use this when you need a dictionary.


The role of the trait resolver is to always find the correct way of constructing the necessary dictionary. In a generic function, this dictionary might need to be built using functions like impl_iterator_for_into_iter; a generic trait implementation with a trait bound becomes a function that transforms dictionaries:

struct Wrapper<I>(I);
impl<Item, I: Iterator<Item>> Iterator<Item> for Wrapper<I> {
    fn next(this: &mut Self) -> Option<Item> {
        next(&mut this.0)
    }
    fn size_hint(this: &Self) -> usize {
        size_hint(&this.0)
    }
}

becomes

fn impl_iterator_for_wrapper<I, Item>(dict: &IteratorDict<I, Item>) -> IteratorDict<Wrapper<I>, Item> {
    IteratorDict {
        next: |this| next(dict, &mut this.0),
        size_hint: |this| size_hint(dict, &this.0),
    }
}
// this code doesn’t compile, see below

This is the point where I notice that using fn pointers in Rust was inaccurate. Should’ve probably been something like Arc<dyn Fn(...) -> ...> instead to more properly simulate the function types of Haskell and allow capturing. But I hope you get the point.


The final step, to get the full picture, is in realizing that if all types had the same representation at run-time, e.g. every type here would implicitly be Box<T>, then all of the functions here monomorphize to the same code, i.e. monomorphization is no longer necessary. Everything does - under the hood - just work with pointers and dynamic function calls. In Haskell, generic type arguments are completely removed/stripped at some stage of compilation, just the way that lifetimes are removed in Rust.


Funnily enough, after having gone through this, I’m seeing more problems with ever getting support for powerful opt-in support for dictionary-passing in Rust, because I forgot about the fact that building up the necessary dictionaries can involve heap allocations, too. Well, actually, maybe building up dictionaries on the stack is feasible and allows most use-cases anyways…



For the record, polymorphic recursion means support for e.g. types like this

struct CompleteBinaryTree<T>(Box<CompleteBinaryTreeEnum<T>>);
enum CompleteBinaryTreeEnum<T> {
    Simply(T),
    DoubleUp(CompleteBinaryTree<Box<(T, T)>>),
} 

The idea behind such a type is that it either has either a T or a Box<(T, T)> or a Box<(Box<(T, T)>, Box<(T, T)>)>, etc... the type system can dictate that it’s going to be balanced, the number of DoubleUp wrappers at the start encodes the size.

Not only can you define such types, you can work with them. If you implement a trait Foo generically such that T: Foo implies Box<(T, T)>: Foo, then next thing you can do is implement CompleteBinaryTree<T>: Foo based on T: Foo, and it just works. This cannot work with monomorphization, because the implementation for CompleteBinaryTree effectively uses arbitrarily complicated trait implementations for all kinds of types T, Box<(T, T)>, Box<(Box<(T, T)>, Box<(T, T)>)>, and so on. Look at this Haskell code for example:

Compiler Explorer

Where

class SumIntegers self where
    sum :: self -> Integer

is basically a trait

trait SumIntegers {
    fn sum(&self) -> BigInt
}

and you write instance SumIntegers SOME_TYPE where ... instead of impl SOME_TYPE: SumIntegers { ... } (the where and indentation is like the block in Rust)

3 Likes