Vector of arbitrary data types with same size?

I'm trying to implement a simple data structure that splits it's elements into buckets (vectors) of common size:

pub struct Storage{
    //The bins of
    bins: HashMap<std::any::TypeId, Box<Vec<?: SomeTrait>>,

}

I'm making the assumption here that everything with the same type id will have the same size (and I don't actually know if that is a guaranty given all the magic rust does with memory optimization. I could use a trait object, but I want the data to be in contiguous memory. How would I go about this? Still new to the more advanced parts of rust. Would need to implement a custom allocator or something?

You could look at how anymap does it. Its uses the Any trait and downcasts values.

1 Like

The only problem with anymap is that the values are stored via indirection (in a box). What I would like to do is store everything in contiguous memory, and somehow communicate to rust that all the data structures indeed have a common size. I could probably pull out a bunch of unsafe raw ptr stuff and get it done, but I would like to do something more idomatic.

Rust can't perform optimizations that would break the layout of a type, that would be insane.

However, there is one technical exception: dynamically-sized types of course have varying sizes, but they can be handled behind indirection only, so you will only ever directly interact with pointers to them anyway. So this is not really an issue.

By the way, Box<Vec<…>> is completely useless. Vec<…> already holds a heap buffer, there's no need for one more heap allocation and one more level of indirection there.

In your original post, the guarantee/assumption was weaker: it was that everything with the same type ID has the same size, not that everything has the same size – which, by the way, is simply not true.

I mean that everything in a common bin (same type id) has the same size. I'm unsure why I boxed the Vec! I meant: HashMap<std::any::TypeId, Vec<..>>
The trivial way to do this would be Vec<Box>, (or a reference) but this pretty much what anymap does, and unsuitable for my needs.
For each type id, I want to create a Vec that is able to store any other type of the same type id (but actually own the data, not just a ptr). Is this possible?

What about something like this?

use std::{
    any::{Any, TypeId},
    collections::HashMap,
};

#[derive(Default)]
pub struct AnyMap(HashMap<TypeId, Box<dyn Any>>);

impl AnyMap {
    pub fn push<T: 'static>(&mut self, value: T) {
        self.0
            .entry(TypeId::of::<T>())
            .or_insert_with(|| Box::new(Vec::<T>::new()))
            .downcast_mut::<Vec<T>>()
            .unwrap()
            .push(value);
    }

    pub fn get<T: 'static>(&self) -> &[T] {
        match self.0.get(&TypeId::of::<T>()) {
            Some(items) => items.downcast_ref::<Vec<T>>().unwrap(),
            None => &[],
        }
    }
}

(playground)

2 Likes

I don't think you can do that in safe code. The closest you can get is actually similar to your boxed vector, which still seems wasteful, but required due to the (current) inability to create a dyn Trait from an unsized value. (Edit: @Michael-F-Bryan beat me to it.)

If you want to remove that overhead, you could implement your own casting mechanism using TypeId and raw pointers obtained from Vec.

I don't think you can implement what you want in safe Rust. Vec needs to know the size of its items at compile time, and the size is determined by the type of the items within. This means that, no matter what you store into the Vec, the size of the item cannot depend on the key in the HashMap, which defeats your approach. You would need a growable buffer which determines the size of its elements at runtime based on dynamic information.

I don't know any such structure off the top of my head, but I wonder whether you really need it. The simplest way to do what you want would probably be HashMap<TypeId, Vec<Box<dyn Any>>>, with a runtime downcast based on the provided type_id. Of course, that assumes that you know in some other way which type actually has that type_id, since there is no mapping from type_id's to types.

Thank you, guys! I think I'm going with @Michael-F-Bryan approach. The pointer indirection is a little bit annoying, but shouldn't pose any runtime performance penalties for my purposes! Thank you!