I've been trying to learn Rust for a little while. I've gotten through a good amount of concepts, but I just feel like I'm cheating some parts of Rust, like the borrow checker. I come from Go/Kotlin so all of this is still a bit hazy.
For example, I feel like I'm overusing Arc, and that I don't need to be. I spent a little while writing a really basic axum server as a little test, and I'd love any criticism to improve or for that code to be more idiomatic. I've run it through clippy of course.
Thank you so much for the feedback, this is insanely helpful.
I just have 1 question- is there a benefit to using the std (or parkinglot) mutex/rwlock over tokio's? Because I just thought tokio's was pretty much the same, except it didn't like lock up the main thread, as it had async support. I'm sure I'm just confused.
std’s and parking_lot’s sychronization primitives (mutexes, rwlocks, etc) can be a lot faster and lighter weight because they don’t need to support async. Tokio’s asynchronous mutex is actually implemented using a synchronous mutex that is locked before and after you start locking the async mutex, in order to co-ordinate the list of waiters.
My only concern is that this is async yet it's basically single-threaded. Everything blocks on the mutex. I didn't look too closely at all the route paths, so there might be some concurrency.. I generally don't like when I see a write-lock, followed by a read lock to act on the lock (lines 50..56).. it's a pointless ownership exchange that can potentially corrupt data (depending on the algorithm).. You should take the strongest lock you'll need and do a single lock+mutate+release. Also as the complexity goes up, it gets harder and harder not to deadlock (when you eventually need a second mutex).
What I'd recommend looking into is a actor pattern (think that's what it's called; maybe Monitor thread or some such thing).. Basically have a dedicated thread that acts on the contended resource. Have the worker threads send messages to it, and it replies to corresponding communication channels. In this fashion, the actor OWNs the data so it doesn't need to lock it maintain an Arc for the outer map.. If you want to "share" references to the values of the map in the response messages, then you'll need to still use an Arc there. You'll have to decide if it's better to:
A) Maintain the Arc inc/dec for each read of a VALUE (not the map itself, but entry values)
B) Clone the underlying contents (might be good if the caller is going to extend a string/vec based on what's read in the map anyway)
C) pass a handler that will be dispatched BY the actor thread on the borrowed read-only single-threaded handler.
Additionally, you can shard the hashmap to multiple actors to provide concurrency.. e.g. if you create 4 hashmaps, then take the hash of the key mod 4, you send a message to one of 4 channels. Then each of those channels can perform the C ( handle some complex read-modify-write in a consistent lock free way).. The calling / blocking thread is free to service other tokio IO events while waiting for the actor to do it's job.
Again, the main complaint is Mutex serialization, defeating any fake notion of concurrency; is better to make the code obvious what the async and single-threaded points are.. mutex's hide and pervert this distinction (e.g. deadlocks and IO threads blocked on logic gates blocked on synchronous-IO (like file open calls)). Harder to reason WHY you only have 12.5% CPU utilization when under full load... At least with the actor, it's pretty clear 99% of the work is in that thread - and sharding makes a very clear algorithmic response (albeit with extra complexity).
Thank you so much for the extremely detailed reply, it’s really valuable. I’m gonna need to read this like 5 more times to get a slight idea what’s going on.
One thing though- is it not worth it to read then only write if I need to? The lines you referenced, I just read then only write if required to. As writing will block reading or writing, but reading is more passive.
I’m still super new so I might be totally wrong, but thank you so much for the advice. I’ll definitely look into the pattern you recommended.
So I did a version with pure rust std:: but the send/sync bit me in the ass, requiring mutex's. So I converted over to crossbeam.. At which point, you can probably find some 3rd party that already does all this. But it's still useful (I think) from an educational perspective (both for the reader here and myself).
Note, you can throw away ALL the threading and async structures, and JUST use the hash-shard.
Note I cheated and didn't use the DbCmd with an explicit hash (because the way hash works in rust was minorly annoying coming from my Java world). If the DbCmd enum operated on a hasher directly, then it would simplify the DB API - clients wouldn't even need to know it's sharded.
For JUST hash gets/puts, this message-passing system will be LESS efficient.. where it could be more efficient is if the worker is doing complex async IO.. e.g. WHILE holding a lock, you need async activity. The monitor thread being the only thread that can read/write from the shard state (shard) can safely just do async/IO as if the entire program was single threaded..And the key is that IO blocking operations (saving state, loading state from disk) doesn'st block ANYBODY... the monitor thread just goes to sleep until either the next in-bound message or IO-completion. So reads that don't need disk-IO can complete while a prior command (that DOES block on IO) can progress, uncontended.
This is basically how a LevelDB/Cassandra would be implemented. A series of shards, each with IO-possible-back-ends.. Clients submit messages into a queue. The DB does whatever it needs to operate in a non-blocking way.
It's not TOO complicated in Rust, and things like mpsc::Channel and crossbeam::Channel both have VERY memory safe mechanisms, so you can code with fury.