Store vector of tuples as tuple of vectors

I was thinking about writing a struct say MyVec<T,U> which implements all the interfaces of Vec<(T,U)> (except for size changing things like Push() and Pop() which I don't care about at the moment) but internally stores the data not as an array of (T,U) tuples, but with two arrays, on array of T and a second array of U.

There might be efficiency gains of this approach in some circumstances, for example, if one wanted to use SIMD instructions.

My thoughts of a implementation for a two tuple was to allocate a region of memory = size(T) * N + size(U) * N, adding some padding if required to align U, where N is the requested vector size.

Then have a struct which contains the following

  1. A pointer to a region of memory allocated above
  2. A pointer to the start ot the U array in that region
  3. N, the length of the vector.
  4. (the capacity if I want to grow the vector, but I'm not worried about that at the moment).

My concern is how I could implement the Index operations. To return the proper result, I think I'd need to construct the result from the parts of the two arrays and return that. And in the case of a IndexMut, if that returned reference is modified, I'll need to write it back to the array, but I'm not sure how. Perhaps it could be done with a special tuple type that writes back to the array on drop but I'm not sure if I'd be able to make the lifetimes work for this.

So I'm basically just asking for ideas on this, or whether it has be done (and whether there's a better approach) or whether it can't be done.

2 Likes

If I understood you correctly, the search term here would be "Array of Structures (AOS)" versus "Structure of Arrays (SOA)". It's a popular topic and well trod, so reading up on it is pretty possible and recommended, I'd say. I don't know a lot about it though, so this remark is all I can give.

3 Likes

As you mention resizing could become a matter I would realy stick to the standard Vec. In your case I would build a new-type like this:

struct MyVec<T, U>(Vec<(T, U)>); // this is exactly what you described

impl<T, U> MyVec<T, U> {
    fn with_capacity(n: usize) -> Self {
        Self ( Vec::with_capacity(n) )
    }
    fn iter_t_u(&self) -> impl Iterator<Item=(&T, &U)> {
        self.0.iter()
    }
    fn iter_t(&self) -> impl Iterator<Item=&T> {
        self.0.iter().map(|(t, _)| t)
    }
    // in this version slice of T or slice of U can't be returned
    fn as_slice_t_u(&self) -> &[(T, U)] {
        &self.0
    }
    // and so on
}

Alternatively, for the struct of arrays solution it would be:

struct MyVec<T, U>(Vec<T>, Vec<U>);

impl<T, U> MyVec<T, U> {
    fn with_capacity(n: usize) -> Self {
        Self ( Vec::with_capacity(n), Vec::with_capacity(n) )
    }
    fn iter_t_u(&self) -> impl Iterator<Item=(&T, &U)> {
        self.0.iter().zip(self.1.iter())
    }
    fn iter_t(&self) -> impl Iterator<Item=&T> {
        self.0.iter()
    }
    // in this version slice of (T, U) can't be returned
    fn slice_t(&self) -> &[T] {
        &self.0
    }
    // and so on
}
1 Like

The first approach means the items of each type aren’t contiguous, and the second isn’t close to compatible with a standard vector of tuples. So I don’t think either answer my question but please correct me if I’m wrong. I basically want to abstract away exactly how the data is stored in memory.

Okay on re-reading your post I guess I figured out now what your specific requirements are:

  • Ts and Us should be stored sperate in a contiguous peace of memory.
  • The struct should implement Index::Output = (T, U).
  • For SIMD optimization you want access to either the T or the U array.

Is this right so far?

Maybe it would be better if you provide some more information about your use case, e.g. are T and U Copy?
Which use case is more likely to occure iterate over a single array or access my_vec[idx]?
Are you familiar with Rust's Iterators, the (current) limitations of arrays and the costs of index out of bounds checks?

1 Like

This is what entity component frameworks do. Existing implementations are probably mostly game-oriented, but maybe you'll find them useful.

3 Likes

I've attempted to do things like this numerous times. It has never worked out well for me. These days I almost always keep the vecs separate, or I put them in a type that unashamedly behaves like a struct of vectors (i.e. all operations operate on the complete set of data, not on indices). If I make a type that is supposed to look like a "container" of (T, U), it probably only implements FromIterator/IntoIterator and little else, to protect myself from falling into that time sink again.

Implementing Index<Output = (T, U)> will be impossible, plain and simple. This is an unfortunate aspect of the Index trait in rust. If you want random access, you'll need to have a method, like pub fn at(usize) or something. If you want code to be agnostic to whether it's using a Vec or this type, you can try to define and implement a trait that represents their common interface... but I wouldn't.

5 Likes