I would like to ask you for a best practice hint regarding a problem. I'm currently porting an old Java app to Rust. This app has a central "state holder" class, with getters and setters. Synchronization and shared access has been managed in the Java app. So a bunch of sources (REST API, native JNI callbacks) are going to contribute to this "state holder", while some business logic accesses these values read-only e.g. for reporting and inner flow control.
Now, the "state holder" struct is really huge: It consists of a lot of scalar variables, strings, arrays etc. My first intention was to create a static struct in Rust and organize for thread safe setters and getters using Mutex or RwLock.
Meanwhile after having read a lot about this topic I'm not that sure anymore. I think, a channel, fed by multiple clones of the sender (the data sources as before) and one receiver, which merges all the pieces together into a static struct would work too.
What I cannot oversee is the performance penalty. From the feeling I would think a shared memory construct would be more performant, i.e. faster. On the other hand I like the nice send/recv pattern Rust provides.
Given the huge size of the shared state object - what would you suggest?
@2e71828 Thank you very much for your response. So shared object instead of messages, right? Are you aware of any sample code, demonstrating the pattern, you are mentioning? I have quickly checked im, but I can't see, how this could help me out here. Could you elaborate? I need a mutable structure, at least mutable from one process a time.
It’s sort of halfway in between the two. At a high level, it looks the same as the messaging solution: you have one process that accepts update requests through some kind of channel, applies them to the master copy of the state, and then sends new copies of the state to anyone that needs it (via channel or shared state).
The difference is in how the copies are made: Instead of eagerly copying all of the data, you increase the reference count of a shared pointer. When you want to update the state, you look at the reference counts and only make a copy of the parts that need to change if they’re currently shared.
That may sound complicated, but the standard library code takes care of the details. The important part is that the copies act completely independent, even though they’re sharing data internally.
For example, your state might look like this:
#[derive(Clone)]
struct State {
a: Arc<ComponentA>,
b: Arc<ComponentB>,
}
impl State {
pub fn inc_a(&mut self, x:u32) {
// Clones `ComponentA` if necessary, and then updates it
Arc::make_mut(&mut self.a).increment(x);
}
pub fn reset(&mut self) {
Arc::make_mut(&mut self.a).reset();
Arc::make_mut(&mut self.b).reset();
}
}
Now, State.clone() is quite cheap: it copies pointers to the two components’s data and increments their refcounts. The component data itself isn’t copied until one of the clones wants to update that component. The centralized state manager can then accept update requests over an mpsc channel, make changes to its local copy of State, and send a copy of the new state to everything else that needs it:
If you chooose to send the update via a channel, the only data sent is a few pointers, and
If you use a shared Mutex/RwLock, the lock only needs to be held long enough to copy the component pointers, not all of the state data.
Ok, I just saw the error, not that it was a different one. Wit &mut self.a the problem is the missing Clone implementation. Annotating ComponentA with derive Clone helps.
@2e71828 No need to apologize Thanks for the snippet. I already missed increment as next
It compiles. Hurray. Now I just need to wrap my head around. Thanks everybody for the pointers. Very much appreciated. I'm sorry, my Rust experience is one week or so, not that much.
It’s not exactly a basic technique, so it may take a little bit of effort to understand. One thing that could help is writing some custom Clone implementations that print something when they’re called.
I wrote some code around with your statement in mind:
Now, State.clone() is quite cheap: it copies pointers to the two components’s data and increments their refcounts. The component data itself isn’t copied until one of the clones wants to update that component
I was a bit surprised to see the clone showing the state at the moment of the cloning. Honestly I was expecting to see the current state (after increments).
So now, that the "publisher" part is clear (increment and reset as template in mind), I was looking for the "subscriber" part. I see that the state is updated correctly. Do I always have to create a fresh state clone, whenever I want to examine the publisher's state?
fn main() {
let mut state = State::new();
let clone = state.clone();
for _ in 0..10000 {
state.inc_a(1);
}
dbg!(state, clone);
}
Gives:
[src/main.rs:153] state = State {
a: ComponentA(
10000,
),
b: ComponentB(
"",
),
}
[src/main.rs:153] clone = State {
a: ComponentA(
0,
),
b: ComponentB(
"",
),
}
Yes, every clone acts as a snapshot of the state at time it was made, just like a clone of any other object. The point of using Arc::make_mut is to keep the cloning cost down so that it can be called frequently.
@2e71828 I think I see the difference in my last approach compared to your's: While my approach is copying the entire data structure, your's is - as you said - just copying pointers to one and the same base data. Am I right with this?
If so, I would favour your approach. I would just need to figure out, how to protect it with a Mutex or RwLock.
That sounds about right. Yours holds the read lock while it copies all of the state data and the write lock while it's calculating the state changes. As my suggestions were all about making clone inexpensive, it only takes a few small tweaks to modify your solution to use as a base for mine.
Additionally, if all of the state changes happen in the same thread, you can calculate them offline and only take the lock for long enough to copy the results in: there's no chance for some intermediate update to happen while you're applying the changes.
Wow again. Would never be able to write such code ATM... Thanks for the input, will check that out.
My state update unfortunately definitely happens from different threads, that's for sure.
Many thanks for this valuable discussion. It helps me a lot to get grip with the language (which is after so many other's I have learned in my life astonishing hard to grasp sometimes).