Data share between one writer thread and many reader thread

What is the best soluntion in Rust for below case?

There is a big size struct. So clone is not the choice.

The writer thread will update the struct time by time, which is the only one writting on the struct.

And at the same time, there are many threads read the struct, and don't care the strust is lastest version or not. (allowed dirty read)

Thanks.

There shoud be only one copy of the struct in the memory.

Your description sounds to me like I'd use an RwLock for that.

Using RwLock, the value has to be sized at runtime. But the size of value should be dynamic sometime.

1 Like

How can I imagine that setup to look like? DSTs must be behind some pointer type in order to use them. If you store said pointer in the struct you put in the RwLock, the struct will be Sized again.

1 Like

Because of allowing dirty reading, the reader should not lock the data to improve the performance.

And there is only one thread writting on the data, so the data should be safe. Is it possible no locker in this case?

You are correct. And I tried it on the playground. But the result of program is some weird. Expecting printing should be number serials from 1 to some bigger number. But actual printing are all 0.

It looks like the two thread run one by one, not run at the same time.

playground

The read() and write() lock the RwLock. You hold the lock for the runtime of the whole thread.

1 Like

Yes, you need to acquire and drop the guard inside the loops to make the output more like what you desire. Playground.

1 Like

Got it. The two thread share a same guard here.

So it is possible no guard here ? Or what is the bad thing if don't use guard.

If you have even one writer and one reader then you have to synchronize in some way, otherwise you'll get a data race, which is Undefined Behaviour (UB). If your program ends up executing UB then it no longer has meaning and can do anything. This is precisely what Rust is made to avoid. A RwLock is the most generic solution for this. Another generic solution would be swapping two copies of the same struct (see for example the left-right crate) but it seems you're not ok with copies.

If you want a more performant or flexible approaches you will need something more specific to your data structure (i.e. no generic solutions).

I would suggest you to start with a RwLock and profile your code, then later decide if that's a bottleneck for you or not and in that case optimize.

The thread share the same RwLock (otherwise they would not be able to affect each other!), and only one of them can have a guard at any given time: if one tries to get one while the other is holding one, then the first one will wait until the second one will release it.

The problem in your playground code is that each thread only gets a guard once before the loop and then holds it for the whole loop. One thread will be able to get the guard at the start, but the other won't and will have to wait for the loop of the first thread to end. This is why they seem sequential.

What you should instead do is to obtain a guard only for a brief moment, just enough to perform the read or write you need. This way whenever a thread is not performing a read/write the other thread will get a chance to obtain a guard and perform its own read/write operation.

The guard is the mechanism that ensures no data race (and thus UB) is happening. If you remove it you'll have to use some other mechanism of synchronization, or the rust compiler will prevent you from mutating the shared data.

4 Likes

I got your idea.

If the reader ask latest data for each accessing the data, there would be data race.

But the reader don't care about it. The dirty data is acceptable. So there should not be data race.

It like calculating a = b + c . the caculating thread copy the a to the cpu register, and do caculating. At the same, the reader thread are accessing the a in the memory. If the reader is care about the latest value of a, the reader shoud wait for the caculating thread fresh the a in memory after calculating. But it is not care about it. So the reader should get a back directly.

When you say "the dirty data is acceptable", which of the following two do you mean:

  1. It does not matter whether this is the latest version of the data as-written, just that this is a version that was written.
  2. It is OK for the data to be a mix of previous writes and the current write - you're deliberately engaging in a data race, and have confirmed that it's safe to do this.
1 Like

All of the two case are ok.

Like random sampling, 50 points of 100 points will meets the requirement.

Like running game, the recording thread records the time of each player reaching the special address. And the watching thread just get the time and annoucing. It is totally safe.

You might be looking for a sequence lock ("seqlock"). There is the seqlock crate for this, but see this open RFC for some information on how its implementation is not ideal (for unavoidable reasons). You also can't use it with non-Copy data.

1 Like

So, given a time represented as:

struct Time {
    pub low_millis: u32
    pub high_millis: u32
}

and a current time value of:

Time { low_millis: u32::MAX, high_millis: 0 }

When this ticks next, it'll change to:

Time { low_millis: 0, high_millis: 1 }

If a player arrives when the tick happens, and you're happy with option 2 (data races), then you're happy for a player's time to be recorded as:

Time { low_millis: u32::MAX, high_millis: 1 }

which is 232 milliseconds (around 50 days) later than the actual time at which the event happened. If this is not OK, then you're not OK with option 2, and you're only OK with option 1.

Your addition example is a data race and thus UB. There's no guarantee that your program will do one memory load into a register, that the memory load has certain semantics, that the load and stores have a certain order and so on. In practice it will probably behave like you expect, but the compiler has no obligation to ensure your program will do that.

The correct way to get a program with those semantics is to use an atomic operation (see std::sync::atomic::AtomicU64 for example) and load/store from it with a relaxed memory ordering

But of course you cannot use this exact approach with a bigger data structure, because there's no atomic type for that. So, let me repeat: any more performant/flexible approach depends on what exactly is your data structure you want to share, and what you want to read from/write to it. Without that information a RwLock is the best you can do.


By the way, "dirty data" is generally ok only in trivial situations. Imagine a Vec shared between two threads. A reader thread is trying to read the element at index 0 while the writer is trying to push a new element. Imagine this execution:

  • the reader reads the pointer of the Vec;
  • the writer sees there's not enough capacity, so it reallocates the vector;
  • the reader now has "dirty data", because the pointer points to deallocated memory, but that pointer is very much not safe to read from.
2 Likes

But we can avoid the situation by a different design.

struct Time {
        pub times : Vec<u32>
}

For each tick, just push the time into the Vec. And then, the watching thread takes the last one simply.

I just want to write a program for this type situation. :rofl:

struct Time {
    pub low_millis: u32
    pub high_millis: u32
}

If facing above case, it has to use locker to keep the operation in atomic. But the actual case is not like that.