Cache line optimization methodology for structs

There seems to be some very simplistic thinking around memory optimization where added complexity (and optimization) is zero cost and not buggy.

In business app dev we had a FS optimized temporal separation of transaction history under the hood. In Rust you have memory alignment and the cache line. NP.

Citing:

https://software.intel.com/en-us/articles/coding-for-performance-data-alignment-and-structures

"One can try to minimize this memory wastage by ordering the structure elements such that the widest (largest) element comes first, followed by the second widest, and so on."

"BKM: Touching only some elements at a time: An exception to this ordering of elements in structures is if your structures are bigger than a cache line, which is 64 bytes in case of Intel Xeon Phi coprocessor, and some loops or kernels touch only a part of the structure. In this case, it may be beneficial to keep the parts of the structure that are touched together in memory, which may lead to improved cache locality.

BKM: Splitting larger structures: If your structures are larger than a cache line with some loops or kernels touching only a part of the structure then you may consider reorganizing the data by splitting the large structure into multiple smaller structures which are stored as separate arrays. "

End quote.

Whatever the rationale behind the arrays being separate, (I would like to know,) this is excessive.

I am intending to apply this within a given CacheLineSize block and not at the struct / record level

ie. Sort the single cache line internally for padding, not the entire struct. Page or frame the struct more logically.

The same argument applies as rationale.

But really, you order your data by RELEVANCE! This is fundamental. Load order, access frequency, temporally, importance. It depends don't it? ... But you tune it like everything else.

The first "frame" should have all you need. Sort of. BLOB pointers are just a persistent heap but for gosh sakes put it all in the first frame or two. Am-I-right?

I note this is a clear abstraction level (or methodology) below the type / struct layout conceptually. Cross cutting in multiple ways.

As such it will be (naturally and counterintuitively) be at the meta (dictionary / io / file system) level. Higher thankfully?

Operating system support / framework / design level or even core/compiler depending on your view? Don't make me say crate.

Or... Not?

Regards, Dgh. See:

Rfc:

Crate for this:

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.