Write to shared memory from multiple threads

I'm having some trouble trying to figure out how to share mutable data between threads.
I have a Vec I want multiple threads to write to simultaneously. I don't care if the threads tries to write to the same index (data races are allowed in this context).
How do I go about doing this?

No, it is never allowed. Rust explicitly says that any data race is an instant UB (undefined behavior). You have to use an atomic operation even if the result doesn't matter.

I don't know what you want to achieve but here is an example of sharing a Vec between threads and threads "race" (but not a data race) for a write:

use std::sync::{Arc, atomic::{AtomicI32, Ordering}};
use std::thread;
fn main() {
    let v = Arc::new(vec![AtomicI32::new(1), AtomicI32::new(2)]);
    {
        let v = Arc::clone(&v);
        thread::spawn(move || v[0].store(2, Ordering::Relaxed));
    };
    {
        let v = Arc::clone(&v);
        thread::spawn(move || v[0].store(3, Ordering::Relaxed));
    };
    // The output is nondeterministic.
    for _ in 0..5 {
        println!("{}", v[0].load(Ordering::Relaxed));
    }
}
2 Likes

I was wondering about this problem the other day....

For example it is quite possible to create a cyclic FIFO buffer between two threads, a writer and a reader, without using any locks or mutexs etc that is thread safe. It's not even difficult, a handful of lines of code.

It is possible that one may want to update an array of values from one thread and use them in another but not be worried about relationship of those values to each other. Who cares if some values read are out of date a little bit with respect the others?

In such cases Rust will complain. You can probably do it using "unsafe". Which will cause people to raise their eyebrows.

I believe that's what @pcpthm is demonstrating using the relaxed atomic load/store. This way it may race but isn't a data race, so therefore not UB. Unless you mean something else, of course! :slight_smile:

2 Likes

uberjay

I think you are right.

The detail is in that "AtomicI32" I think.

If one is running on a 32 bit CPU one imagines that all loads and stores to integers are atomic. That is the nature of the hardware.

Might not be so if ones CPU is a 16 bitter for example.

or if the CPU is capable of unaligned loads and stores and the 32-bit value is not align 4.

Good point.

I was forgetting how much of the undefined behavior, that C and C++ are famous for, is implemented in hardware now a days.

So the scenario is something like a map highlighting where people are. Imagine splitting the world in 2'000'000 parts. A 0 represents an empty part and a 1 that someone's there. This will make it easy to highlight the map.
To not loop through all 7'500'000'000 people sequentially I want to split the work on multiple threads.
I said that data races were allowed in a sense, because it doesn't matter if two or more threads tries to set an index to one, the result will be the same either way.
I will try pcpthm's solution, but I predict the performance to be bad. Correct me if I'm wrong, but atomic is blocking (or spinlocking) so threads working on popular indices, like large cities, will starve each other and in worst case a thread might wait for a long time setting an index to 1 (that's already set to 1).
I'm new to Rust (mostly programming in C and C#) so this is all good stuff :slight_smile:

Can't you split the people by region, and split the world appropriately? split_at_mut helps you there, and that will allow you to avoid the races somewhat trivially.

Or you might create several world copies, have the threads work on those an merge them. vec![bool; 2_000_000] isn't really large, so maybe that works out.

Well I can't know if a person will move to another region. I can't really know where people are located before I run, so a split will be a bit wonky.
Merging several large arrays seems like a lot of work, but might be fast?
Edit: the actual implementation I wanted to do was some multithreaded rendering on the cpu, but this scenario boils down the core problem.

An atomic operation never blocks or spin-locks a thread. That is what atomic operations for. On x86-64, the relaxed store is literary the same instruction as the normal store evidence.

3 Likes

The rust documantation states that "Atomic types and operations are not guaranteed to be wait-free" so I assumed that one thread has to wait for the other when doing an atomic operation

"Wait-free" is a technical term of computer science and you cannot conclude anything from the naive meaning.
The sentence is mostly relevant for the CAS (compare-and-swap) atomic operation. Simple load or store are always "wait-free" if I'm correct.

1 Like

Data race is UB, and UB is a contract violation between the programmer and the optimizing compiler. The compiler has responsibility to produce correct executable from the given code if the program never touches UB condition during execution. The compiler optimize the code based on this contract, it usually elide or mess up the UB-touching path to make non-UB path faster. If the code unconditionally touches the UB case, it's perfectly legal for compiler to produce empty executable without any instructions, to make world's fastest program that can be executed within zero clock.

Wait-free is a rather strong guarantee. In lock-free system with N threads, on any point of time it's guaranteed at least one of the threads makes progress in a finite number of steps. In the other words, lock-free system is guaranteed to not makes deadlock condition. In wait-free system, it's guaranteed that every tasks will be completed in finite number of steps, so no task will ever be blocked by any other thread.

3 Likes

I don't know about any esoteric CS definitionsof "wait" and "wait free" but in the context of this discussion the practical question was: Would wrapping accesses to integers in Atomic have an impact on performance.

As pcpthm demonstates above accessing a 64 bit integer on x86-64 Atomically does not suffer any performance hit over accessing it with regular read/writes.

Which is what we would expect as x86-64 instructions can read and write 64 bit quantites in a single instruction. A single memory access.

But...what happens when you build that code for a 32 or 16 bit compiler? Then 2 or 4 reads or writes will be required for each access. They had better be done atomically. Hence be wrapped in some mutual exclustion mechanism.

There will be a performance hit in those cases.

Perhaps not the CS meaning of "wait" but for sure all other threads will be blocked for the short time it takes to write the different parts of that i64.

That is what Atomics are for. To ensure that your source witll behave the same no matter what platform.

Atomic operations also inhibit compiler optimizations that would result in incorrect code in the presence of data races. This means that they can have performance impact and are necessary for soundness and correctness even if you compile only on platforms where the atomic reads/writes use the same instructions as non-atomic ones.

12 Likes

In addition to mrbubeck's solid response, maybe this will help...

@Tannimun The key terms that no one else appears to have mentioned here are "data races" and "race conditions". As others have said, a data race is a precise technical term that's part of the language contract, and data races are always UB, so by definition there is no such thing as an acceptable data race. But race conditions are largely a matter of intent (do you need X to happen before Y, or is either order fine?), so I'm pretty sure what you actually meant to say in your opening post was that race conditions are fine in your code.

One of the simplest ways to have acceptable race conditions is if the racy action is idempotent, like the case you described: Both threads are trying to set an integer to 1, so it doesn't matter which one does it first because all orders lead to the same result anyway. But even in this scenario, the integer must be atomic to avoid a data race. In addition to the "theoretical UB" (we can't even define "all orders" if data races are involved), if Rust allowed you to compile this program with a non-atomic integer, a very possible outcome on real hardware would be both threads trying to write 1 to the integer "at the same time" such that the integer's final value ends up being something "random" value like 42 or 0xFFFFFF instead of the 1 you wanted (either because the hardware allows such things, or because the compiler "optimized" your incorrect code).

And yes, "lock-free" and "wait-free" are completely different terms that do not imply each other. Atomics are lock-free, but not necessarily wait-free. If you're not familiar with what "wait-free" means, then don't worry about it for now, because that is not important for writing correct Rust code. But you do need to understand why data races are bad to understand some of the Rust compiler's error messages and how to address them when multiple threads are involved.

3 Likes

Nitpick: lock-free and wait-free are related in the sense that wait-free is a strict superset of lock-free.

But as you said, it's an advanced topic, most definitely not to be tackled before having acquired a solid understanding of what "data races are UB" precisely means and why this UB exists.

Wait-free is a subset of lock-free

1 Like

Nice catch! :wink: