Reducing cached database pages

The database software I am making has a cache for database pages, to allow them to be shared by processes running concurrent queries, and to avoid have to reload the page from secondary storage for each query. If the database is large, this can mean that after many queries which individually only access a small part of the database, the memory allocated by the program becomes excessive (perhaps leading to an out-of-memory condition if the swap file is not large enough, or poor performance due to excessive swapping ).

So it seems desirable to limit the memory used by the cache somehow. I am interested in approaches to this problem - how much memory to free up, and when. For example, would it (likely) be better to free the entire cache periodically, or just trim it a bit? Also, is it worth trying to decide which pages to free? What are your thoughts?

You can just empty or shrink the cache as soon as it exceeds some pre-specified size. As for what to free, a common heuristic is "least recently used" (LRU), which means purging the pages that you didn't need for the longest time (ie., in ascending order of last access time).

1 Like

If it were me, I'd let the user specify a maximum cache size (e.g. 50% of RAM by default) and evict pages one at a time in least recently used order. That way you keep as much data in memory as you can, while throwing away pages that weren't accessed recently. Unless you have some extra domain knowledge, how recently a page was touched is one of the few data points your database's caching algorithm will have access to.

The problem I see with clearing the entire cache is that the next couple queries after a clear will all be cold - especially if certain parts of the database are accessed frequently and would always be in cache (e.g. a users table) - and it'll take a while for the cache to warm up again.

If you graphed the query durations over time, I reckon you'd see something like this:

Untitled Diagram

I don't have any numbers to back this theory up, but it'd be an interesting experiment to perform.

1 Like

"a cache with bad policy is another name for a memory leak" - though they're talking about internal object caching, the concepts are pretty similar.

As a warning, I'm not an expert in DB design, but I've picked up a few things from poking into them.

You have the extra quirk that it's relatively pointless to page out the items in your cache, unless your cache is late in an expensive query - as the OS will LRU cache disk accesses anyway.

Because of that, what you're caching here isn't disk but the time spent looking for where on disk the data is. Keeping that information around would be very handy to resolve policies like "minimize query worst times", and you might be better off only keeping the result index to reduce memory usage (possibly as a two-stage eviction).

Ideally you would be able to get to memory pressure from the OS so you can be self balancing, but desktop OS's (and especially Linux) are terrible at it, in my experience at least. You might have some more luck than me, though: PSI - Pressure Stall Information — The Linux Kernel documentation


Thanks for this. In your first link, in the first link there (" Caching Implies Policy"), the author is referring to a garbage-collected system, (I presume Microsoft .Net), and issues that caused, so I am not sure if that experience carries over to a Rust-based system.

I agree, it seems best to trim. I coded an LRU cache which keeps the memory used for the cache below a limit using a doubly-linked list. This is the function that trims the cache.

Certainly the specific reasons in the post don't apply in the same way, but the overall point that you need to actually measure performance to know if your cache is helping or hurting is absolutely still true.

1 Like

Yes. I have done some experiments ( I wouldn't exactly call them "measuring performance", they are not very systematic or scientific ) which appear to show roughly that:

(1) There are can be benefits from having the data in the higher level cache rather than in OS buffers. In one experiment I just did ( summarising 50,000 records ), I saw times of 20 msec compared to 40msec.

(2) There seems to be a significant penalty if the data is not in either the high level cache or OS buffers. The same test took over 1 second (after running a Rust build in an attempt to sabotage the OS file caching - but I got the same after re-booting the server). But I am not entirely sure what I am seeing there.

1 Like

This is a question for db internals people. Not sure if there is a good forum for that topic, but if so I would ask it there.

I have worked on db internals and can say that LRU is typical but far from ideal. When thinking about an ideal db cache I became convinced it would be an application of TinyLFU. The paper also describes the drawbacks of LRU for caching and some alternatives.

If you search around you'll find examples of people trying or thinking about using TinyLFU for db caching. For example, search for tinylfu in the open issues for the rust sled db.


You seem to be saying it's a cache of the query results, not the file pages? Eg the cage is in front of the query engine, not behind it? That's what I was saying with

So you might want to measure the time spent in the query engine and number disk pages accessed if you're trying to figure out what your cache policy is. "Query time" isn't likely to give you good results with disk cache involved messing things up.

If nothing else, have an option to run with and without your cache, so you can tell what part of cold time to warm time improvement is due to your cache, rather than the OS's.

Thanks very much for this. I read the TinyLFU paper (fairly quickly). I notice that on "reset" they divide counters by two, my immediate reaction to that was maybe a less drastic reset would be a better, say "reduce by 1/16" ( using a binary fixed-point representation to maintain some level of accuracy ).

Yes, tuning that may be beneficial. I never implemented TinyFLU but I imagined that the reset process would require tuning with some real world testing. For example, without testing it is difficult to say how long it will take to iterate over all the aging values (edit: and update them).

I discovered this Crate:

which includes an implementation of WTinyLFU.

Some other more sophisticated (than LRU) cache policies are implemented in

1 Like

For now, I decided to go for a simple LFU ( least frequently used ) method for choosing which pages to remove, using a simple counter and a heap.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.