Data race allowed array

zeroexcuses · December 11, 2021, 1:43am

I want to allocate an array of 10^9 u32's with the following properties:

threads can write to this, as long as each u32 write is atomic
threads can also read from this, while other threads are writing
I don't care if a reader readers in the middle of a writer writing, as long as each u32 is 'atomic'

I want this to be as high performance as possible. Is Vec<Cell<u32>> the way to do, or is there something better ?

One more thing: reader 'reads' in blocks of 1024, where it just sends 1024 contiguous u32's to a tcp socket.

steffahn · December 11, 2021, 2:00am

Cell does not support multiple threads. Perhaps you meant Vec<AtomicU32>?

zeroexcuses · December 11, 2021, 2:12am

I meant Vec<Cell<u32>>, because I (stupidly) did not realize it does not support multiple threads.
You're right looks like

fn store(&self, val: u32, order: Ordering)[−]
Unstable (integer_atomics #32976)
Stores a value into the atomic integer.

store takes an Ordering argument which describes the memory ordering of this operation.

from AtomicU32 is what I need.

Is there something faster then Vec<AtomicU32>, or is this the way to go?

OptimisticPeach · December 11, 2021, 3:20am

Well, note that this is essentially a Vec<u32> but all of the u32s require atomic instructions to do anything.

I'm not sure what point there is in this; you could get garbage data (a data race) which is probably unwanted. If you're completely sure this won't happen and have some kind of mutex around them, then you could use UnsafeCell<u32>.

I would probably save myself the headache and just go with atomic reads and writes however.

zeroexcuses · December 11, 2021, 3:28am

Think about an explosion particles being sent over UDP. If half the particles are from time t=0, and half are from t=1, it's not a big deal.

skysch · December 11, 2021, 3:31am

One reason you might want to do something like this if you're trying to fill an audio buffer from one thread while another is reading it for playback. You'd probably use some other synchronization mechanism to ensure that the reader doesn't read where the writer is currently writing, but overall you want writes to be as fast as possible and reading garbage data just means unexpected audio output, which is not a big deal.

Cerber-Ursi · December 11, 2021, 3:56am

If I understand correctly, they mean "as long as access to each u32 is synchronized, I don't care whether access to the whole array is synchronized". In other words: each entry in the array must be data-race-free, but the array as a whole may be not.

zeroexcuses · December 11, 2021, 4:11am

Yeah, sorry for not being clear on this. Formally, let S0, S1, R all be Vec, all of 10^9 elements.

Suppose some writer is trying to change the state from S0 to S1 and some reader reads R.

I require that for all i: R[i] = S0[i] || R[i] = S1[i]. So each u32 has to be the old value or the new value, but the overall R can be partly S0 and partly S1.

Hyeonu · December 11, 2021, 4:33am

Data race is UB. And there's no such case "But UB is OK for my specific use case".

You can use Vec<AtomicU32> or Arc<[AtomicU32]> etc. It's recommended to start with SeqCst all the places as it's the strictest ordering. But if you can make it sure that the each access to the u32s are independent between each others AND other synchronization events like Mutex and channels, you can use Relaxed instead which is the weakest ordering. On archs with strong memory model like x86 they may not ne different in performance. But on with weak memory model archs like ARM you'll likely be observe the difference.

zeroexcuses · December 11, 2021, 4:45am

I agree this statement is true for Rust. Would GoLang or C allow this type of "data race okay in this limited situation" constraint I am going for ?

Hyeonu · December 11, 2021, 5:06am

C/++, especially since the C99 and C++11 - versions they explicitly added the multithreading capabilities and behavior to the language - have same situation. Never use volatiles for interthread synchronization. It's only allowed within the 20th century.

Go doesn't have any UB conditions in its spec, though it mentions some operations may yields some unspecified value. It prevents some heavy optimization capabilities, but it's intentional as the heavy also implies longer compile time which is the problem the Go tries to avoid at all cost.

parasyte · December 11, 2021, 5:12am

Race conditions are still considered bugs in Go. Which is why they have a lot of tooling built around TSan. Introducing the Go Race Detector - go.dev It's similar to the kind of testing you might do with loom.

OptimisticPeach · December 11, 2021, 5:24am

What you're looking for is a relaxed atomic read then. A data race read would literally return garbage: either 0, the first value, the second value, or some bitwise interpolation of the two. A relaxed atomic read has the same instruction as a regular read on some platforms as well.

But why would the ordering of the instructions matter if when they sample the value relative to other events doesn't matter to them?

droundy · December 11, 2021, 5:25am

This sounds exactly like Vec<AtomicU32>, with relaxed ordering. And on x86 I'd expect it to perform the same as non-atomic u32.

Hyeonu · December 11, 2021, 5:36am

Sorry I don't understand the question. If it doesn't matter, you can specify the weakest constraint (ordering) to give optimizer/cpu maximum flexibility.

zeroexcuses · December 11, 2021, 5:37am

Thanks everyone for helping me formulate this informally stated question into a formally consistent question. Good job @steffahn for somehow getting the answer off the bat.

OptimisticPeach · December 11, 2021, 5:38am

That was my question . I believe we are going in circles, my apologies.

zeroexcuses · December 11, 2021, 5:39am

@Hyeonu

Relaxed is weaker than SeqCst.
Relaxed is strong enough for my requirements.

vi0 · December 13, 2021, 8:30am

Is TearCell relevant here?

jan17 · December 13, 2021, 1:14pm

I would say no as it seems to be made explicitly for types that don't fit into one atomic, e.g. [u32; 16]. But OP wants to access u32 atomically, which fits into an atomic.

From docs on when this library might not be four you: "Your T (is POD and) fits inside one of the atomic types in core::sync::atomic. TearCell will literally be worse in every way than just performing Relaxed loads/stores to that type."

Topic		Replies	Views
Write to shared memory from multiple threads help	20	3262	February 16, 2020
Writing to same array across threads help	9	2952	January 12, 2023
UnsafeCell Sync help	11	346	December 24, 2022
Multithreading with multiple writes to same slice without lock help	20	3653	March 7, 2021
X86_64 linux: cas4 of AtomicU64	10	830	July 22, 2022

Data race allowed array

Related Topics