Why Box<dyn T> works?

"Sized" and "pass by value" are not the same. Even if you allow dynamically-sized things to be passed by value (which does have advantages and there are indeed proposals for it), they still can't be treated like statically-sized types. In particular:

This still wouldn't work. Since dyn Traits created from objects of different size would have those respective (different) sizes, there would be no way for the Vec to store them by-value inline, in a contiguous allocation.

3 Likes

How would this be represented in memory? If this is supposed to mean a vector of pointers to the heap, then @steffahn is correct that using an explicit Box is better: Vec<Box<dyn Trait>>.

Rust has explicit memory management, this is precisely what Box is for, so it's not neater to do it automatically behind the programmer's back.

4 Likes

you are right. I think I know what usize really mean. thank you.

dyn Trait itself is sized , can fit in the contiguous allocation , just like an array.

I do not think this is a bad idea , because dyn can be used as an indicator to the programmer.
so the steps when push a dyn Trait might be

trait Tr {}
struct U;
struct V(u8);
impl Tr for U {}
impl Tr for V {}
let mut v : Vec<dyn Tr>;
v.push(U);        // allocate a U , allocate a dyn Tr which is a fat pointer
                  // setup the fat pointer : 
                  // 1. pointing the data to the newly allocated U
                  // 2. pointing the vtable to vtable of Tr
v.push(V(0));     // same as U

u8 takes single byte, i32 takes 4 bytes, and String takes 24 bytes on 64bit machine. All three have impl Display so now we found 3 different sizes the dyn Display can have. If we have Vec<dyn Display>, how can we know the memory offset of the 10th element of it which is necessary for the expression &v[10]? If all the elements in the vector are actually u8 the offset would be 10. But what if some of them are i32 or String or some type in another crate deep in your dependency tree?

the memory layout looks like

offset      element
----------------------------------------
+0x00      fat pointer for u8
+0x10      fat pointer for i32
+0x20      fat pointer for String
----------------------------------------

a drawback is with Vec<dyn Tr> , we can not pre-allocate memory precisely , this might degrade the performance.

Currently the dyn Trait doesn't introduce any indirection on its own. Is it a proposal to change the language's behavior?

They're proposing auto-boxing, basically, yes.

the big advantage is heterogeneous container will be more straightforward.

Simply put, this is a misconception. Box<T> is the same size as &T, not T, so of course the fact that Box<dyn T> is 16 bytes doesn't mean that dyn T is 16 bytes.

1 Like

Just use what you want, Vec<&dyn T>, Vec<Box<dyn T >>, Vec<Arc<dyn T >>...

3 Likes

It is not, that's the entire point.

1 Like

But heterogeneous collections are rare and almost always non-idiomatic in Rust. We shouldn't introduce automatic heap allocation just for that niche.

If you want to encapsulate operations so that they be hidden at the API level for the sake of ergonomics, then that's something that user / third-party libraries can already do quite well on their own. A toy example would be:

type MyVec<DynTrait /* : ?Sized */> = Vec<Box<DynTrait>>;

#[extension(trait AutoBoxed)]
impl<DynTrait : ?Sized> MyVec<DynTrait> {
   fn my_push (
       self: &'_ mut Self,
       value: impl Unsize<DynTrait>,
   )
   {
       self.push(Box::new(value) as _)
   }
}

fn main ()
{
   let mut v: MyVec<dyn ::core::fmt::Debug> = vec![];
   v.my_push(42);
   v.my_push("Hello, World!");
   v.iter().for_each(|it| { dbg!(it); });
}
  • Playground

  • impl Unsize<DynTrait> is akin to impl Trait, but for some "generic trait" (it's the only form of "trait polymorphism" we have (on nightly), without macros). Otherwise for some fixed trait (e.g., Any), you can simplify the signatures to use dyn Any directly and impl Any, etc.).

  • (#[extension(trait …)] is a handy pattern for ad-hoc extension traits.)

A more fully-fledged example would probably feature an actual newtype, so as to rename .my_push into .push().


The one area where we could improve ergonomics would be with a "coerce to Box<dyn Trait> if needed" operation, so that .my_push() could handle both impl Traits as well as already-box-erased Box<dyn Trait>s: right now Box<dyn Trait> may be an impl Trait in and of itself (e.g., Trait = Send or Trait = FnMut()) which would make the two impls overlap, with one of these two impls leading to double boxing.

  • In the general case, specialization is needed to tell those apart, and make it favor the more optimized "identity" implementation for Box<dyn Trait>.

  • But for the case where it's a trait under your control, you can make your trait Box-transitive only in the Sized case, letting the Box<dyn Trait> case not overlap with Box<impl Trait + Sized>: Playground.

1 Like

That's part of the memory layout, but what about the addresses where the pointers point to? What is their layout? Do they live on the heap, on the stack? Do you want to allow sharing them? Do you want them garbage collected?

If you want them on the stack, then you run into the problem of how to manage memory on the stack. What happens to the memory layout if you do v.remove(1)?

In Rust these questions are handled explicitly, it's not like in Java where all objects live in the same place with automatic garbage collection. That why you have various options for different ways of managing memory: Vec<&dyn T>, Vec<Box<dyn T>>, Vec<Rc<dyn T>>, Vec<*const dyn T>, etc.

I am not seriously proposing anything. Just for fun. :grinning:

In my understanding , Vec will allocate memory in heap and Vec itself will be stack allocated (roughly hold a pointer to the heap allocated memory). the heap allocated memory may be divided into 2 parts , the first part is used to store the fat pointer array , the second part is used to store the real objects.

remove(1) will remove the fat pointer and its associated object.

If you want it to remove the object from the heap, then the heap allocated memory has to be divided into more than 2 parts, because if the objects were consecutive in memory, then v.remove(1) would leave an empty hole.

If you want one heap allocated array of pointers, and multiple separate heap allocations for the objects, then Vec<Box<dyn T>> says precisely that very nicely. There is no real value in hiding the Box from the type, it's good to have Box there, so that we know we're allocating each element on the heap.

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.