Problems in implementing a cache on top of LRU data structure

thebluetropics · March 9, 2024, 9:25am

Hi, fellow rustaceans. I am trying to build some sort of caching in my web application. I am using an LRU data structure for the caching (lru::LruCache).

let mut cache = Mutex::new(LruCache::new(NonZeroUsize::new(4096)));

However, there are problems:

How do I ensure that each key-value pair have only one access at a time, both write or read.
If there's multiple access to the same key-value pair, they have to wait in queue.

The problem above will give you another problem. The cold (least used) key-value pair will be dropped once a new key-value pair is added. What if there's something that still accessing/modifying a least used key-value pair, and then the key-value pair get dropped?

Some of the solution that I can think of is either reject the access immediately or make the cold(least used) key-value pair hot(recently used).

But I am struggling to write these by myself, I need someone else's help. It can be a library, or you, guide me to the solution to these problems...

I appreciate any help on this, thanks :)

thebluetropics · March 9, 2024, 9:38am

Under the normal circumstances:

> mutex: lock the cache
> claim the key-value pair and set the status "modifying..."
> mutex: unlock the cache
> fetch from database
> mutex: lock the cache
> write database fetch result to the key-value pair
> remove the key-value pair "modifying..." status
> mutex: lock the cache

Abnormal circumstances:

> mutex: lock the cache
> claim the key-value pair for make the status "modifying..."
> mutex: unlock the cache
> fetch from database
> [!] here, the key-value pair removed because it's been cold. [!]
> mutex: lock the cache
> wtf, where's the key-value pair? (program confused here)

firebits.io · March 9, 2024, 11:21am

What do you mean by "claim"? If by it you mean to write to the cache, then the order of the operations in your example is wrong. You can't write to te cache without obtaining a mutex lock.

thebluetropics · March 9, 2024, 11:26am

Oh, it's a state in the key-value pair itself:

Mutex<LruCache<Uuid, User>>

struct User {
  // ...
  locked: bool,
  wait_queue: FifoQueue<BoxFuture<'static, ()>>
}

If you trying to "claim" it and found the entry is being locked, you need to put a Future in the wait_queue. There's a maximum to the wait_queue, for example 8.

"modifying..." state means the locked field of the struct is set to true.

firebits.io · March 9, 2024, 11:40am

I see. What I would recommend you is to perform all the necessary changes while you hold the lock. The cache shouldn't evict your key-value pair since no new key-value pair could be written while you hold the lock.

thebluetropics · March 9, 2024, 11:51am

I am sorry but, I don't think that's a good idea. Correct me if I wrong tho.

If you want me to hold the parking_lot::Mutex across await calls to the database, it will completely kills concurrency and parallelism... Obviously this is not what you mean.

If you want me to fetch the database and then obtain the lock after you got the fetched data, then write to the cache, and then unlock. But, something else might finish their database fetch and then write to the cache before this one.

Consider when there's two database fetch at the same time:

> lookup cache, found no desired key-value pair (cache miss)
> database fetch start
> [!] someone else finished db fetch and then set the key-value pair [!]
> database fetch complete
> mutex: lock the cache
> [!] "wtf, how did this key-value pair exists, I swear it's not there!" [!]

thebluetropics · March 9, 2024, 11:52am

That's why I use queue instead of unordered access to the key-value pair. Anyone can access the master lock of the cache in any order, but you have to wait in queue if you try to modify/read/write the cache entry (key-value pair)

firebits.io · March 9, 2024, 11:55am

Correct. That's why I recommended you on another thread to use Tokio's Mutex instead.

thebluetropics · March 9, 2024, 11:59am

I've always avoiding tokio::sync::Mutex whenever possible, because they(tokio documentation) officially said so.

I never know tokio::sync::Mutex is commonly used in this scenario or not. I'll try, though.

firebits.io · March 9, 2024, 12:16pm

Where does it say so?

From Tokio's Mutex documentation:

The feature that the async mutex offers over the blocking mutex is the ability to keep it locked across an .await point. This makes the async mutex more expensive than the blocking mutex, so the blocking mutex should be preferred in the cases where it can be used. The primary use case for the async mutex is to provide shared mutable access to IO resources such as a database connection. If the value behind the mutex is just data, it’s usually appropriate to use a blocking mutex such as the one in the standard library or parking_lot.
Note that, although the compiler will not prevent the std Mutex from holding its guard across .await points in situations where the task is not movable between threads, this virtually never leads to correct concurrent code in practice as it can easily lead to deadlocks.

You have outlined above that your use case demands to aquire the lock before fetching from the database. To me it's clear that you need a Mutex guard that can be held across await points.

system · June 7, 2024, 12:17pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
When is efficient to lock the Arc<Mutex<_>> in context with HTTP Request help	5	1251	January 12, 2023
[solved] Lifetime issues when trying to cache Futures help	3	1115	November 30, 2019
Having trouble caching database data help	9	592	April 12, 2020
Do i need a (c/c++) volatile variable in rust	8	3295	January 12, 2023
Async refreshing cache using Tokio: help help	8	1089	April 25, 2021

Problems in implementing a cache on top of LRU data structure

Related topics