Problems in implementing a cache on top of LRU data structure

Hi, fellow rustaceans. I am trying to build some sort of caching in my web application. I am using an LRU data structure for the caching (lru::LruCache).

let mut cache = Mutex::new(LruCache::new(NonZeroUsize::new(4096)));

However, there are problems:

  • How do I ensure that each key-value pair have only one access at a time, both write or read.
  • If there's multiple access to the same key-value pair, they have to wait in queue.

The problem above will give you another problem. The cold (least used) key-value pair will be dropped once a new key-value pair is added. What if there's something that still accessing/modifying a least used key-value pair, and then the key-value pair get dropped?

Some of the solution that I can think of is either reject the access immediately or make the cold(least used) key-value pair hot(recently used).

But I am struggling to write these by myself, I need someone else's help. It can be a library, or you, guide me to the solution to these problems...


I appreciate any help on this, thanks :)

Under the normal circumstances:

> mutex: lock the cache
> claim the key-value pair and set the status "modifying..."
> mutex: unlock the cache
> fetch from database
> mutex: lock the cache
> write database fetch result to the key-value pair
> remove the key-value pair "modifying..." status
> mutex: lock the cache

Abnormal circumstances:

> mutex: lock the cache
> claim the key-value pair for make the status "modifying..."
> mutex: unlock the cache
> fetch from database
> [!] here, the key-value pair removed because it's been cold. [!]
> mutex: lock the cache
> wtf, where's the key-value pair? (program confused here)

What do you mean by "claim"? If by it you mean to write to the cache, then the order of the operations in your example is wrong. You can't write to te cache without obtaining a mutex lock.

1 Like

Oh, it's a state in the key-value pair itself:

Mutex<LruCache<Uuid, User>>
struct User {
  // ...
  locked: bool,
  wait_queue: FifoQueue<BoxFuture<'static, ()>>
} 

If you trying to "claim" it and found the entry is being locked, you need to put a Future in the wait_queue. There's a maximum to the wait_queue, for example 8.

"modifying..." state means the locked field of the struct is set to true.

I see. What I would recommend you is to perform all the necessary changes while you hold the lock. The cache shouldn't evict your key-value pair since no new key-value pair could be written while you hold the lock.

1 Like

I am sorry but, I don't think that's a good idea. Correct me if I wrong tho.

If you want me to hold the parking_lot::Mutex across await calls to the database, it will completely kills concurrency and parallelism... Obviously this is not what you mean.

If you want me to fetch the database and then obtain the lock after you got the fetched data, then write to the cache, and then unlock. But, something else might finish their database fetch and then write to the cache before this one.

Consider when there's two database fetch at the same time:

> lookup cache, found no desired key-value pair (cache miss)
> database fetch start
> [!] someone else finished db fetch and then set the key-value pair [!]
> database fetch complete
> mutex: lock the cache
> [!] "wtf, how did this key-value pair exists, I swear it's not there!" [!]

That's why I use queue instead of unordered access to the key-value pair. Anyone can access the master lock of the cache in any order, but you have to wait in queue if you try to modify/read/write the cache entry (key-value pair)

Correct. That's why I recommended you on another thread to use Tokio's Mutex instead.

1 Like

I've always avoiding tokio::sync::Mutex whenever possible, because they(tokio documentation) officially said so.

I never know tokio::sync::Mutex is commonly used in this scenario or not. I'll try, though.

Where does it say so?

From Tokio's Mutex documentation:

The feature that the async mutex offers over the blocking mutex is the ability to keep it locked across an .await point. This makes the async mutex more expensive than the blocking mutex, so the blocking mutex should be preferred in the cases where it can be used. The primary use case for the async mutex is to provide shared mutable access to IO resources such as a database connection. If the value behind the mutex is just data, it’s usually appropriate to use a blocking mutex such as the one in the standard library or parking_lot.
Note that, although the compiler will not prevent the std Mutex from holding its guard across .await points in situations where the task is not movable between threads, this virtually never leads to correct concurrent code in practice as it can easily lead to deadlocks.

You have outlined above that your use case demands to aquire the lock before fetching from the database. To me it's clear that you need a Mutex guard that can be held across await points.

1 Like