I meant Vec<Cell<u32>>, because I (stupidly) did not realize it does not support multiple threads.
You're right looks like
fn store(&self, val: u32, order: Ordering)[−]
Unstable (integer_atomics #32976)
Stores a value into the atomic integer.
store takes an Ordering argument which describes the memory ordering of this operation.
from AtomicU32 is what I need.
Is there something faster then Vec<AtomicU32>, or is this the way to go?
Well, note that this is essentially a Vec<u32> but all of the u32s require atomic instructions to do anything.
I'm not sure what point there is in this; you could get garbage data (a data race) which is probably unwanted. If you're completely sure this won't happen and have some kind of mutex around them, then you could use UnsafeCell<u32>.
I would probably save myself the headache and just go with atomic reads and writes however.
One reason you might want to do something like this if you're trying to fill an audio buffer from one thread while another is reading it for playback. You'd probably use some other synchronization mechanism to ensure that the reader doesn't read where the writer is currently writing, but overall you want writes to be as fast as possible and reading garbage data just means unexpected audio output, which is not a big deal.
If I understand correctly, they mean "as long as access to each u32 is synchronized, I don't care whether access to the whole array is synchronized". In other words: each entry in the array must be data-race-free, but the array as a whole may be not.
Yeah, sorry for not being clear on this. Formally, let S0, S1, R all be Vec, all of 10^9 elements.
Suppose some writer is trying to change the state from S0 to S1 and some reader reads R.
I require that for all i: R[i] = S0[i] || R[i] = S1[i]. So each u32 has to be the old value or the new value, but the overall R can be partly S0 and partly S1.
Data race is UB. And there's no such case "But UB is OK for my specific use case".
You can use Vec<AtomicU32> or Arc<[AtomicU32]> etc. It's recommended to start with SeqCst all the places as it's the strictest ordering. But if you can make it sure that the each access to the u32s are independent between each others AND other synchronization events like Mutex and channels, you can use Relaxed instead which is the weakest ordering. On archs with strong memory model like x86 they may not ne different in performance. But on with weak memory model archs like ARM you'll likely be observe the difference.
C/++, especially since the C99 and C++11 - versions they explicitly added the multithreading capabilities and behavior to the language - have same situation. Never use volatiles for interthread synchronization. It's only allowed within the 20th century.
Go doesn't have any UB conditions in its spec, though it mentions some operations may yields some unspecified value. It prevents some heavy optimization capabilities, but it's intentional as the heavy also implies longer compile time which is the problem the Go tries to avoid at all cost.
Race conditions are still considered bugs in Go. Which is why they have a lot of tooling built around TSan. Introducing the Go Race Detector - go.dev It's similar to the kind of testing you might do with loom.
What you're looking for is a relaxed atomic read then. A data race read would literally return garbage: either 0, the first value, the second value, or some bitwise interpolation of the two. A relaxed atomic read has the same instruction as a regular read on some platforms as well.
But why would the ordering of the instructions matter if when they sample the value relative to other events doesn't matter to them?
Sorry I don't understand the question. If it doesn't matter, you can specify the weakest constraint (ordering) to give optimizer/cpu maximum flexibility.
Thanks everyone for helping me formulate this informally stated question into a formally consistent question. Good job @steffahn for somehow getting the answer off the bat.
I would say no as it seems to be made explicitly for types that don't fit into one atomic, e.g. [u32; 16]. But OP wants to access u32 atomically, which fits into an atomic.
From docs on when this library might not be four you: "Your T (is POD and) fits inside one of the atomic types in core::sync::atomic. TearCell will literally be worse in every way than just performing Relaxed loads/stores to that type."