Datastructure: two vectors vs one vector of two elements

I am building the following data structure: I have a list/vector with objects that have two different versions (formatted and unformatted). The obvious two ways of representing these are: 1. one vector that contains a struct that contains the two versions. 2. Two vectors, each storing one if the versions. Important: i don't want an enum, i want to store both versions at all time.

So my question: what is more efficient? Memory-wise it is probably similar, but i imagine performance is different? I have wrappers that provide an iterator to the outside of the interface, i assume creating such iterators would be more expensive for option 1 (one vector with both versions), or does the compiler optimizie all of this away anyways? What if I access one of the two variant lists more often than the other, does this impact performance depending on my approach?

Looking forward to your answers! Thanks!

The truth about performance is always: depends on the concrete problem, implementation and hardware. You are describing a rather classical consideration of memory layout when dealing with an array of some record type called AoS vs SoA. The rule of thumb I learned at uni was: AoS (array of structs) is easier to read and implement while SoA (struct of array) is more cache-efficient for the most common access patterns and easier for the compiler to vectorize and therefore more performant. But again, benchmark and look at the assembly if you want to be sure which makes more sense for you.

3 Likes

If you frequently need to iterate over just one version, option 2 will probably perform better since more of the memory that gets cached will be used. If it's mostly random access I imagine there won't be a noticable difference performance-wise.

As always, the only way to be sure of what performance differences you'll see is to measure them directly.

1 Like

That's not the right question to ask. Don't try to squeeze out low-level performance hacks as your first thought, without thinking about all other implications, chiefly, correctness.

The two types mean different things. A pair of vectors allows the lengths to be different (and even whether they are supposed to be ordered according to the same business rule is up to interpretation). Meanwhile, a record of pairs means that the 1-to-1 correspondence is enforced by the type, and the relationship is obvious.

Thus, I'd strongly advise you to go with the vector-of-pairs approach if what you logically need is a list of pairs. Don't worry about "performance" without measuring. If your biggest performance bottleneck turns out to be this particular choice of layout, then you might have bigger issues (and should probably consider re-organizing your code completely in an ECS anyway).

6 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.