A question regarding state sharing layout

Hi forum,

I would like to ask you for a best practice hint regarding a problem. I'm currently porting an old Java app to Rust. This app has a central "state holder" class, with getters and setters. Synchronization and shared access has been managed in the Java app. So a bunch of sources (REST API, native JNI callbacks) are going to contribute to this "state holder", while some business logic accesses these values read-only e.g. for reporting and inner flow control.

Now, the "state holder" struct is really huge: It consists of a lot of scalar variables, strings, arrays etc. My first intention was to create a static struct in Rust and organize for thread safe setters and getters using Mutex or RwLock.

Meanwhile after having read a lot about this topic I'm not that sure anymore. I think, a channel, fed by multiple clones of the sender (the data sources as before) and one receiver, which merges all the pieces together into a static struct would work too.

What I cannot oversee is the performance penalty. From the feeling I would think a shared memory construct would be more performant, i.e. faster. On the other hand I like the nice send/recv pattern Rust provides.

Given the huge size of the shared state object - what would you suggest?

TIA

I’d consider storing substructures inside your state object as Arcs, and use Arc::make_mut() for copy-on-write semantics:

  • The overhead of an update is proportional to how much data it modifies
  • Semantically, you preserve the idea of a central place that receives and processes updates
  • Downstream users can have a consistent state snapshot to work from without blocking writes

If you need generic data structures like lists or maps, the im crate provides ones that work on this model.

@2e71828 Thank you very much for your response. So shared object instead of messages, right? Are you aware of any sample code, demonstrating the pattern, you are mentioning? I have quickly checked im, but I can't see, how this could help me out here. Could you elaborate? I need a mutable structure, at least mutable from one process a time.

TIA

It’s sort of halfway in between the two. At a high level, it looks the same as the messaging solution: you have one process that accepts update requests through some kind of channel, applies them to the master copy of the state, and then sends new copies of the state to anyone that needs it (via channel or shared state).

The difference is in how the copies are made: Instead of eagerly copying all of the data, you increase the reference count of a shared pointer. When you want to update the state, you look at the reference counts and only make a copy of the parts that need to change if they’re currently shared.

That may sound complicated, but the standard library code takes care of the details. The important part is that the copies act completely independent, even though they’re sharing data internally.

For example, your state might look like this:

#[derive(Clone)]
struct State {
    a: Arc<ComponentA>,
    b: Arc<ComponentB>,
}

impl State {
    pub fn inc_a(&mut self, x:u32) {
        // Clones `ComponentA` if necessary, and then updates it
        Arc::make_mut(&mut self.a).increment(x);
    }

    pub fn reset(&mut self) {
        Arc::make_mut(&mut self.a).reset();
        Arc::make_mut(&mut self.b).reset();
    }
}

(Playground)

Now, State.clone() is quite cheap: it copies pointers to the two components’s data and increments their refcounts. The component data itself isn’t copied until one of the clones wants to update that component. The centralized state manager can then accept update requests over an mpsc channel, make changes to its local copy of State, and send a copy of the new state to everything else that needs it:

  • If you chooose to send the update via a channel, the only data sent is a few pointers, and
  • If you use a shared Mutex/RwLock, the lock only needs to be held long enough to copy the component pointers, not all of the state data.

Edit: Corrected code and provided playground link

3 Likes

Wow. Thank you very much for this comprehensive answer. I will definitely give it a try.

But it is somehow not working for me:

no method named make_mut found for struct std::sync::Arc<ComponentA> in the current scope
this is an associated function, not a method

I think I get the idea, but I still need a good way to go :slight_smile:

It's Arc::make_mut(&mut arc), not arc.make_mut(). The reason is that, if the wrapped value has make_mut method too, Arc::make_mut won't interfere.

1 Like

Yes, thought that too, but the nightmare continues :slight_smile:

Tried various things, can't make it work ATM...

Even a combination with mut does not work for me.

Again, it's Arc::make_mut(&mut arc), not Arc::make_mut(&arc), as the error message hints.

Ok, I just saw the error, not that it was a different one. Wit &mut self.a the problem is the missing Clone implementation. Annotating ComponentA with derive Clone helps.

Thanks

My apologies; I posted that code snippet without testing. Here’s a playground link that compiles successfully.

@2e71828 No need to apologize :slight_smile: Thanks for the snippet. I already missed increment as next :slight_smile:

It compiles. Hurray. Now I just need to wrap my head around. Thanks everybody for the pointers. Very much appreciated. I'm sorry, my Rust experience is one week or so, not that much.

It’s not exactly a basic technique, so it may take a little bit of effort to understand. One thing that could help is writing some custom Clone implementations that print something when they’re called.

Am I allowed to bother again?

I wrote some code around with your statement in mind:

Now, State.clone() is quite cheap: it copies pointers to the two components’s data and increments their refcounts. The component data itself isn’t copied until one of the clones wants to update that component

I was a bit surprised to see the clone showing the state at the moment of the cloning. Honestly I was expecting to see the current state (after increments).

So now, that the "publisher" part is clear (increment and reset as template in mind), I was looking for the "subscriber" part. I see that the state is updated correctly. Do I always have to create a fresh state clone, whenever I want to examine the publisher's state?

fn main() {
    let mut state = State::new();
    let clone = state.clone();    

    for _ in 0..10000 {
        state.inc_a(1);
    }

    dbg!(state, clone);
}

Gives:

[src/main.rs:153] state = State {
    a: ComponentA(
        10000,
    ),
    b: ComponentB(
        "",
    ),
}
[src/main.rs:153] clone = State {
    a: ComponentA(
        0,
    ),
    b: ComponentB(
        "",
    ),
}

Yes, every clone acts as a snapshot of the state at time it was made, just like a clone of any other object. The point of using Arc::make_mut is to keep the cloning cost down so that it can be called frequently.

1 Like

Cool. Thanks for clarification. I'm fine with that. Hoping for a great performance.

My last question on this topic: What is your opinion on this solution?

playground

@2e71828 I think I see the difference in my last approach compared to your's: While my approach is copying the entire data structure, your's is - as you said - just copying pointers to one and the same base data. Am I right with this?

If so, I would favour your approach. I would just need to figure out, how to protect it with a Mutex or RwLock.

Thanks

That sounds about right. Yours holds the read lock while it copies all of the state data and the write lock while it's calculating the state changes. As my suggestions were all about making clone inexpensive, it only takes a few small tweaks to modify your solution to use as a base for mine.

Additionally, if all of the state changes happen in the same thread, you can calculate them offline and only take the lock for long enough to copy the results in: there's no chance for some intermediate update to happen while you're applying the changes.

1 Like

Wow again. Would never be able to write such code ATM... Thanks for the input, will check that out.
My state update unfortunately definitely happens from different threads, that's for sure.

Many thanks for this valuable discussion. It helps me a lot to get grip with the language (which is after so many other's I have learned in my life astonishing hard to grasp sometimes).