30-100mcs for a simple HashMap::get call sounds unbelievably high. Are you sure the problem is indeed with HashMap::get? How did you determine it? If that's the output of some standard profiler, could it be that the issue is really in the calling functions and you're just observing some botched debug symbols?
Having to deal with empty tombstones is certainly something that comes to mind, but if you're using hashmap in an append-only way it shouldn't be much of an issue. And still, 100mcs is ridiculously high. You could scan your entire 20000-element hashmap in that time, and why would that happen? (Unless you're using some erroneous full-scan quadratic algorithm, but it looks like an unlikely error when you already have a hashmap)
Does the issue persist if you change HashMap into a BTreeMap? They are API-compatible, so it should be easy to test. Are you using the default hash function? How large are the items, exactly? In you other post you suggest they are within 32 bytes, is this true for the actual code? If not, have you tried boxing the items before insertion to reduce their size and pressure on the memory?
EDIT: Actually, even your starting 5mcs per HashMap::get looks crazy slow. Maaaaaybe it's believable if your entire hashmap is always out of cache and is read from memory, but even then it's straining it. Could it be that issue is that at some lower level? E.g. maybe your OS pages out the memory holding your hashmap, so every access incurs a page fault, possibly a disk read? That's more in the ballpark of your numbers. 5mcs for a page fault is more believable than for a simple memory access, and 100mcs for swapped out memory is even more believable, particularly if you use an HDD.
Re-cloning the entire hashmap would touch all of its pages, and could cause the speedup that you observe. You said slowdown happens after an hour, and your app possibly runs as a background process, which would make it prone to swapping out.
What are the specs of your hardware? Could you be running on something non-standard in 2024?
EDIT2: "after running for an hour, or after running a large game" --- yeah, that looks exactly like a swapping issue to me. A large game is extremely likely to kick passive processes from memory into swap. There is little you can do to deal with it. In general, the idea is that if your app runs often enough and touches its memory often enough that its performance matters, then it will not be swapped out, or at least that will happen rarely. If you run only once an hour, then probably a 100mcs delay for accessing paged out memory shouldn't matter either.
Of course, maybe your app is for whatever reason latency-critical, and must execute asap. In general, it's impossible to enforce, unless you control the OS. If you do, you can look up you OS documentation on real-time priority and keeping pages in memory.
From an app, you can try to mlock
pages to ensure they are never swapped (on Linux; Windows and Mac would have a different but similar API). But the OS may ignore your request due to lack of privilege or some other config issue. Locking a few pages should generally work, but if you try to keep mega/gigabytes resident, you may very well be denied.