Data race allowed array

I want to allocate an array of 10^9 u32's with the following properties:

  1. threads can write to this, as long as each u32 write is atomic

  2. threads can also read from this, while other threads are writing

  3. I don't care if a reader readers in the middle of a writer writing, as long as each u32 is 'atomic'

I want this to be as high performance as possible. Is Vec<Cell<u32>> the way to do, or is there something better ?

One more thing: reader 'reads' in blocks of 1024, where it just sends 1024 contiguous u32's to a tcp socket.

Cell does not support multiple threads. Perhaps you meant Vec<AtomicU32>?

4 Likes
  1. I meant Vec<Cell<u32>>, because I (stupidly) did not realize it does not support multiple threads.

  2. You're right looks like

fn store(&self, val: u32, order: Ordering)[−]
Unstable (integer_atomics #32976)
Stores a value into the atomic integer.

store takes an Ordering argument which describes the memory ordering of this operation.

from AtomicU32 is what I need.

  1. Is there something faster then Vec<AtomicU32>, or is this the way to go?

Well, note that this is essentially a Vec<u32> but all of the u32s require atomic instructions to do anything.

I'm not sure what point there is in this; you could get garbage data (a data race) which is probably unwanted. If you're completely sure this won't happen and have some kind of mutex around them, then you could use UnsafeCell<u32>.


I would probably save myself the headache and just go with atomic reads and writes however.

2 Likes

Think about an explosion particles being sent over UDP. If half the particles are from time t=0, and half are from t=1, it's not a big deal.

One reason you might want to do something like this if you're trying to fill an audio buffer from one thread while another is reading it for playback. You'd probably use some other synchronization mechanism to ensure that the reader doesn't read where the writer is currently writing, but overall you want writes to be as fast as possible and reading garbage data just means unexpected audio output, which is not a big deal.

If I understand correctly, they mean "as long as access to each u32 is synchronized, I don't care whether access to the whole array is synchronized". In other words: each entry in the array must be data-race-free, but the array as a whole may be not.

2 Likes

Yeah, sorry for not being clear on this. Formally, let S0, S1, R all be Vec, all of 10^9 elements.

Suppose some writer is trying to change the state from S0 to S1 and some reader reads R.

I require that for all i: R[i] = S0[i] || R[i] = S1[i]. So each u32 has to be the old value or the new value, but the overall R can be partly S0 and partly S1.

Data race is UB. And there's no such case "But UB is OK for my specific use case".

You can use Vec<AtomicU32> or Arc<[AtomicU32]> etc. It's recommended to start with SeqCst all the places as it's the strictest ordering. But if you can make it sure that the each access to the u32s are independent between each others AND other synchronization events like Mutex and channels, you can use Relaxed instead which is the weakest ordering. On archs with strong memory model like x86 they may not ne different in performance. But on with weak memory model archs like ARM you'll likely be observe the difference.

5 Likes

I agree this statement is true for Rust. Would GoLang or C allow this type of "data race okay in this limited situation" constraint I am going for ?

C/++, especially since the C99 and C++11 - versions they explicitly added the multithreading capabilities and behavior to the language - have same situation. Never use volatiles for interthread synchronization. It's only allowed within the 20th century.

Go doesn't have any UB conditions in its spec, though it mentions some operations may yields some unspecified value. It prevents some heavy optimization capabilities, but it's intentional as the heavy also implies longer compile time which is the problem the Go tries to avoid at all cost.

2 Likes

Race conditions are still considered bugs in Go. Which is why they have a lot of tooling built around TSan. Introducing the Go Race Detector - go.dev It's similar to the kind of testing you might do with loom.

1 Like

What you're looking for is a relaxed atomic read then. A data race read would literally return garbage: either 0, the first value, the second value, or some bitwise interpolation of the two. A relaxed atomic read has the same instruction as a regular read on some platforms as well.

But why would the ordering of the instructions matter if when they sample the value relative to other events doesn't matter to them?

2 Likes

This sounds exactly like Vec<AtomicU32>, with relaxed ordering. And on x86 I'd expect it to perform the same as non-atomic u32.

1 Like

Sorry I don't understand the question. If it doesn't matter, you can specify the weakest constraint (ordering) to give optimizer/cpu maximum flexibility.

2 Likes

Thanks everyone for helping me formulate this informally stated question into a formally consistent question. Good job @steffahn for somehow getting the answer off the bat.

That was my question :sweat_smile:. I believe we are going in circles, my apologies.

@Hyeonu

  1. Relaxed is weaker than SeqCst.

  2. Relaxed is strong enough for my requirements.

Is TearCell relevant here?

I would say no as it seems to be made explicitly for types that don't fit into one atomic, e.g. [u32; 16]. But OP wants to access u32 atomically, which fits into an atomic.

From docs on when this library might not be four you: "Your T (is POD and) fits inside one of the atomic types in core::sync::atomic. TearCell will literally be worse in every way than just performing Relaxed loads/stores to that type."