Shared cache for a webserver

I have a webserver where I initially fetch the JSON Web Key Set from the authentication server to validate the tokens. After a while or if I don't find a key in the set I might want to update the data.

This seems simple at first, but I am struggling. What ist the best practice? Is there a crate that can support me?

  • If I just put the cache behind a Mutex, all of the Requests will be serialized at this point
  • If I notice that the data might be outdated, I only want to trigger one update of the data not an update by every concurrent webserver request that notices, that the data ist outdated
  • If the data is in the process of being updated I might have all other requests wait for the real fresh data, but maybe this creates a dangerous queue? Or should I give old data to all concurrent requests knowing they also may require the fresh key set data?

I might suggest the dashmap crate with regard to your request serialization concern. If you want a shared cache between servers, using a centralized Redis server for the cache is a common technique (or open source alternative such as valkey). It's unclear what you're saying with your second and third points tbh.

If I just put the cache behind a Mutex, all of the Requests will be serialized at this point

When dealing with a potentially mutable resource that is shared between multiple requests, some form of synchronisation is always involved.

If I notice that the data might be outdated, I only want to trigger one update of the data not an update by every concurrent webserver request that notices, that the data ist outdated

A common approach is to spawn a separate task/thread to handle the I/O resource and use message passing (e.g. by using channels). The managing task holds the key set, performs lookups and updates the key set when certain events occur.

If the data is in the process of being updated I might have all other requests wait for the real fresh data, but maybe this creates a dangerous queue?

Absolutely! There are ways to deal with such scenarios, especially when using channels:

  • Limit the buffer size of the channel used to communicate with the managing task.
  • Use a timeout when waiting for a response from the managing task.

In both cases your request handlers should return an error to the user.

1 Like

Since the workload should be read-mostly you can use an RWLock or ArcSwap.

  • If I notice that the data might be outdated, I only want to trigger one update of the data not an update by every concurrent webserver request that notices, that the data ist outdated

Instead of updating it from requests you can setup a timer to regularly refresh it.

  • If the data is in the process of being updated I might have all other requests wait for the real fresh data, but maybe this creates a dangerous queue? Or should I give old data to all concurrent requests knowing they also may require the fresh key set data?

You can prepare the new data and only briefly take the lock to replace it. Or use ArcSwap.

2 Likes

Instead of updating it from requests you can setup a timer to regularly refresh it.

That definitely sounds like a reasonable solution, and I will think about if this works for me. This would simplify things a lot.

Of course it could happen, that some token arrives that is signed with a key I don't have in my key set and my last refresh has been a while. In this case I might want to refresh immediately to serve this request correctly with the updated keys.

Now it gets difficult with RwLock, ArcSwap or a short lived lock, because I somehow have to coordinate, that only one task will trigger the update.

I don't know what's common procedure, but I'd hope key servers start publishing new keys for a while before they actually use them to mint signatures. If that is so then a timer would be enough.

1 Like