Making a nested non-shared structure contiguous in memory

We say X is "simple" if X is:

  • a struct where all fields are SIMPLE or
  • an enum where all fields are SIMPLE or
  • Box<T> where T is SIMPLE or
  • Vec<T> where T is SIMPLE

The point is: if we own a SIMPLE object, nothing else can be referring to it's descendants (i.e. we're not using Arc / Rc / ...)

=====

Suppose now, we have a SIMPLE structure -- it might be 'fragmented' all over the heap. Is there a way to take a owned SIMPLE structure, and have it move pieces around so it becomes a continuous block in memory ?

=====

XY problem here is: I have some read-only structures I want to stuff into adjacent 4kb pages for better L1 cache locality

Seems to me to preserve memory locality you can:

  1. Not create those X stucts all over heap memory by wrapping them in Box, Rc, Arc or whatever. Instead keep them in a Vec where they will all be contiguous. Use indices into that Vec to access then rather than those smart pointers. Perhaps create that Vec with a known required capacity, if you know it.

  2. Instead of keeping all those fields in a struct and have a Vec of structs split them up. Have one Vec per field. ending up with a struct of Vecs rather than a Vec of structs.

All of which I gather is known as Entity Component System by the you generation and used much in game software, where performance is kink and cache locality is crucial.

https://ianjk.com/ecs-in-rust/

ECS is how people programmed before the scourge of Object Oriented Programming swept the world.

1 Like

If all fields were of the type T, I would be happy to put everything into a Vec<T>.

The problem if that we have types T1, T2, T3, ... so stuffing them into a Vec is going to require Vec<u8> and marshalling / unmarshalling, which I'm hoping to avoid.

It can be done, but it can be difficult and the best way depends on the specific structure of your type. It also depends on things like whether the vectors have a fixed size, and for vectors whose size is not fixed, whether there is some maximum length for that vector.

Would you go so far as claiming that a crate / procedural macro for doing this generically is unlikely to exist?

Indeed, I do not think such a macro exists at the moment.

What about abomonation? It's purpose is slightly different (serialization/deserialization), but seems to be able to achieve the goal – put the nested structure into continuous memory.

1 Like

That library doesn't provide any way to read the data besides putting it back into the fragmented type.

(emphasis mine) That wording may be misleading, as the "decoding" in abomonation doesn't really create a new object, but rather returns a reference &X that points into the decoded buffer.

So, eg. the workflow could be:

  1. Say you have struct X defined with regular vecs, boxes etc. (with appropriate abomonation derives applied)
  2. Create X's instance as usual.
  3. Encode into Vec<u8> (you now no longer need original instance).
  4. Decode into Abomonated<X> (which basically a wrapper around Vec<u8> that provides deref into &X). This steps adjusts all the internal pointers so that they point into that allocation.
  5. Now, use the Abomonated<X> in all cases you need read-only access to X.
1 Like

Oh my, that library is more cursed than what I initially thought.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.