Ignoring potential data races related to the vector, I agree that this code should have defined behavior, so we can say a few things about it. We can say:
- that all threads see flags as all true initially (happens-before induced by spawn)
- that the parent thread will see flags as all false after all threads are joined (hb induced by join)
- that no thread will see a true value after it has seen a false value for
flags[i]
(transitivity)
I think this should end it. Most notably, we cannot say
- when a thread sees
flags[i]
to be false relative to other thread's progress - we cannot say anything about other atomic variables you may share between threads. In other words, loads and stores from/to those variables will have no relationship to the loads and stores to
flags[i]
across threads. In particular, when considering all updates by all threads to all atomic shared variables, there may not exist a sequential order that is consistent with the update order observed by the threads.
So the correctness of whatever algorithm these flags implement could be judged only once the rest of the code that accesses shared state is revealed (if there is such code).
It helps to be somewhat formal (it's not overdoing it, IMO) and stick with the definition of a data race as:
A program that has two conflicting evaluations has a data race unless either both conflicting evaluations are atomic operations one of the conflicting evaluations happens-before another.
where "conflicting evaluations" are defined as:
When an evaluation of an expression writes to a memory location and another evaluation reads or modifies the same memory location, the expressions are said to conflict
When adopting this view we wouldn't call it a "data race." Memory location btw is a byte.
So data races are undefined behavior, evaluations involving two atomic operations on the same memory location don't conflict, but rather are guarded by the memory model guarantees. These guarantees can range from nothing (relaxed) to very little (sequential consistency).
Although I'm not 100% sure, using relaxed
assignments to atomics probably has no purpose beyond removing the formal undefinedness of a program that would otherwise exist. Most algorithms one might come up with require at least sequential consistency (IIRC classic algorithms like Dekker's and Peterson's do).
An approach that's sometimes taken is to start with an implementation that uses default atomics with sequential consistency, and then reason about how to relax it (for related operations, this would typically by acquire/release sections). See this paper for an example of this technique in the context of a lock free deque.