Any good books on cache intrinsics?

I'm interested in data-oriented design and I'm devoting a lot of the time I spend on personal projects to micro-optimization.

Something I'm working on now requires reading 4 chunks of a chunked 2D grid into the cache at a time. The chunks are bigger than a single cache line, but if I can prefetch the entirety of the contents of each of the 4 chunks, I don't think I'll have any cache misses on the operations that work within elements of that subgrid.

I see some interesting intrinsics in nightly, though the documentation isn't yet entirely there for a non-expert to get started with.

What I most want to do is just provide a range of memory to fetch all at once. The memory is laid out linearly inside each chunk, but not necessarily from one chunk to the next. So I would like to provide a range for each chunk as I glide across the overall 2D grid and work on 4-chunk blocks at a time.

I welcome answers to my particular question, suggestions that I'm going about this the wrong way entirely, or general books I should read for more background in this area.

1 Like

Chapters 3, 6.2 and 6.3 of Ulrich Drepper's What Every Programmer Should Know About Memory would make a good starting point for general background. They describe how caches work on a modernish (Core 2 Duo era) CPU, and while the details have changed since then, the broad behaviour is the same.

You generally don't want to prefetch a large block of memory at a time - the more prefetches you issue, the more likely they are to end up ignored as the cache is reused for "useful" data. Instead, you should be using a profiler to guide you, and prefetching the smallest amount of data that gets you a performance boost - e.g. start a loop by prefetching the items the next iteration will touch.

4 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.