Why is Box<str> special?

It seems like in most cases Box<T> is just a pointer to heap allocated memory. But Box<str> seems to be a special case where the str is "inlined" and holds both a pointer and a length. So for example Box<u8> is 8 bytes but Box<str> is 16-bytes (assuming a 64-bit architecture).

I can see that this is a good optimisation but how is it implemented? In the standard library or as a special case in the compiler? Is it documented anywhere? Can not find any mention of it in "the book", the reference or in standard library documentation.

/Mikael

that is incorrect.
in all cases, Box<T> is just a simple pointer to heap allocated memory. there are no exceptions.

a Box<str> does not inline anything, it is a simple pointer to a heap allocated str. to make another parralel Box<u8> vs &u8 behave the same as Box<str> vs &str.

what lead you to incorrectly believe there is an exception, is the fact that str is indeed a special type. is it what is called a Dynamically Sized Type, which are type that can only exist behind some kind of pointer. the two major dynamically sized type you may know are str and [T] both of these types have a runtime decided length, and this length is stored inside of the pointers which point to them.
so for example &str, *const str, *mut str, Box<str>, Rc<str>, &[T], &mut [T], and Arc<[T] are 2 usize long.

note, Dynamically Sized Type are usually reffered as DST, so i'll do so from now on.
Box<T : DST>s do indeed have something sligthly special about them. not what they are made of, they are always just one pointer under the hood no matter what, but on how they are constructed.
indeed as i said before DSTs can only exist behind a pointer. so you can't ever call Box::new, because it needs to take a T by value. that is why for all DSTs, you need a special method to construct them, useually From::from.

(there is also a method called unsizing, that creates a box to a sized type and transforms it into a box to a DST but it's not really worth getting into, as it doesn't apply to str)

3 Likes

what lead you to incorrectly believe there is an exception

I tried tried the following code:

let y: Box<&[u8]> = Box::new(&[1, 2, 3]);
println!("Size Box<&[u8]> {}", size_of_val(&y));
println!("Size Box<&[u8]> {}", size_of::<Box<&[u8]>>());

let x: Box<str> = "Hello".to_string().into_boxed_str();
println!("Size Box<str> {}", size_of_val(&x));
println!("Size Box<str> {}", size_of::<Box<str>>());

It prints:

Size Box<&[u8]> 8
Size Box<&[u8]> 8
Size Box<str> 16
Size Box<str> 16

Playing around in lldb and displaying memory seems to confirm the observation.

/Mikael

Whenever a pointer is to a DST, that pointer is larger in order to store the metadata of the DST (which for str is its length). This applies equally to all pointer types. So, in a sense, there is a special rule, but the special rule is not for Box<str>, it's for AnyPointer<AnyDst>.

4 Likes

Note that a Box<&[u8]> is quite a different thing from a Box<[u8]> and a Box<str>.

I think you'll find that a size_of::<Box<[u8]>>() = size_of::<Box<str>>().

2 Likes

you are comparing apples to oranges here. the correct comparison would be

let y: Box<[u8]> = [1, 2, 3].to_vec().into_boxed_slice();
println!("Size Box<[u8]> {}", size_of_val(&y));
println!("Size Box<[u8]> {}", size_of::<Box<[u8]>>());
  
let x: Box<str> = "Hello".to_string().into_boxed_str();
println!("Size Box<str> {}", size_of_val(&x));
println!("Size Box<str> {}", size_of::<Box<str>>());
Size Box<[u8]> 16
Size Box<[u8]> 16
Size Box<str> 16
Size Box<str> 16

or

let y: Box<&[u8]> = Box::new(&[1, 2, 3]);
println!("Size Box<&[u8]> {}", size_of_val(&y));
println!("Size Box<&[u8]> {}", size_of::<Box<&[u8]>>());
  
let x: Box<&str> = Box::new("hello");
println!("Size Box<&str> {}", size_of_val(&x));
println!("Size Box<&str> {}", size_of::<Box<&str>>());
Size Box<&[u8]> 8
Size Box<&[u8]> 8
Size Box<&str> 8
Size Box<&str> 8

it's pretty obvious from the way the code is written

&T (or Box<T>, or any other type that is just a pointer)is never a DST, it is always sized, 1 usize big if T is sized, or 2 usize big otherwize
so, recursively, because &T/Box<T>/Rc<T>/*mut T/etc.. itself is always sized, Box<&T>, *mut Rc<T>, &&T, or &mut *const T will always be 1 usize big. that is what we are observing here

1 Like

Thanks everyone for pointing out my silly confusion around &[u8] and [u8] :wink:

It makes a lot of sense that DSTs and non-DSTs behave different with regards to Box now that you explained it to me.

/Mikael