How to access a large memory cache efficiently?

The situation:

I have a Tokio based program that receives short text messages over a small number of persistent connections. Each message received has an owner id of which there are about 5 million owners in the system. Only about 100,000 of the owners are heavily sending messages, each of which may send 5 messages per second (however a single owner can spike to thousands of messages a second in rare cases)

The job to be done:

Each owner has a list of rules about what action to take in response to the incoming message. These rules are 100bytes to 3kbytes. The plan is to store these rules in a hashmap for the 100,000 most recently active owners, with the rest maintained in a database. The database will feed data into the hashmap when it is required and the data should expire from the hashmap if unused for a period.

My questions:

How do I efficiently access the hashmap in regards to ownership? The methods I know of are Arc and mpsc. Though I have no idea which would be faster. Intuitively it seems simpler to have one thread hold the hashmap and hand out the data using message passing, but I'm unsure if this is really sustainable when there's hundreds of thousands of requests per second.

Second question, hashmap probably isn't the most efficient tool for this job, can you recommend a better one?

Are there any consistency guarantees that the hashmap needs to uphold with regards to the rules? Can the rules be updated?

I'd look into chashmap and evmap and see if any of these could suit your use-case.

The hashmap does need to be able to update, but it will be extremely infrequent for most owners, perhaps once per month.

In regards to consistency, I'm not sure. The incoming data that the rules are being applied on is not extremely important, but I would prefer not to be incorrectly handling it (meaning rules not being applied) frequently. It would absolutely be okay if after an owner's rules were updated, they did not take effect for a short period (even a few seconds, which is probably much longer than the time it would actually take to get the state correct in the hashmap)

Also look into dashmap and moka. Moka might be closest to what you want - it supports expiring entries. Unfortunately it's still pre 1.0.

It actually sounds like you have a small enough amount of data that will a little money (to pay for RAM), and a little cleverness (to not store it in inefficient manner) you can keep ALL of it in memory. Which will make everything pretty fast, and depending how you do it you then just end up relying on computer L1/L2/L3 cache heirarchy for speed of access to commonly access entries.

Or, an embedded database like LMDB would also transparently put less frequently accessed data on disk, and frequently accessed data in RAM (or rather, it relies on OS mechanisms that do so).

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.