How can I access a vector from multiple threads without a mutex? I know what I am doing and I don't have any meaningful race condition.
For example here is my C code using OpenCIlk for multithreading?
void graph_trim(graph g) {
bool *has_in = calloc(g->v, sizeof(bool));
if (!has_in) {
fprintf(stderr, "Couldn't allocate has_in array\n");
exit(1);
}
bool *has_out = calloc(g->v, sizeof(bool));
if (!has_out) {
fprintf(stderr, "Couldn't allocate has_out array\n");
exit(1);
}
// run for in parallel
cilk_for(size_t v = 0; v < g->v; v++) {
for (size_t j = g->in->com[v]; j < g->in->com[v + 1]; j++) {
if (!g->removed[v] && g->in->unc[j] != v) {
has_in[v] = true;
has_out[g->in->unc[j]] = true;
}
}
}
cilk_for(size_t v = 0; v < g->v; v++) {
if (!has_in[v] || !has_out[v]) {
g->removed[v] = true;
*(g->n_trimmed) += 1;
}
}
free(has_in);
free(has_out);
}
The first race you mentioned is meaningless (as I said "I don't have any meaningful races") because I don't care which thread is going to turn the has_out[g->in->unc[j]] to true.
For the second race there is actually no race condition because I am using OpenCilk's reducer there.
That doesn't matter. Race conditions are undefined behavior (in both languages), regardless of whether you care about them. (I.e., if you have a race condition, the compiler has the right to turn your program into code that launches the nukes, etc.)
Yes, that'd be a conforming outcome (and so would be literally anything else). Given that the assignment i = 1 in thread_function is racy, the compiler can assume it never happens as-is (without synchronization, e.g. as in your code). So it might as well remove it altogether.
You probably can't get clang to optimize that program to not print i = 1, but the point is that if clang did so, then it would not be a bug in clang.
The correct way to do that kind of thing is to use relaxed atomic operations instead. Those compile down to exactly the same instructions as what you're currently doing, and it would not be a data race.
"Technically UB but I know it compiles ok [asterisk today on my machines with my compiler]" is a stance which is culturally rejected across practically all of the Rust community. UB is defined at the language level and UB (and unsoundness, where UB is possible to trigger) is to be avoided.
Morever, if you have a Rust program that does not contain unsafe but contains UB (even if "just" technically), that means
One of your dependencies has unsound unsafe, and it's their fault, or
The compiler has a bug, and it's the compiler's fault
As a corollary, there is no way to have "harmless UB" (like a "meaningless data race") in safe Rust.
(And no one's going to want to "help" you do so with unsafe, due to the cultural rejection of "if it compiled ok this time we're good". [1] We'd rather help you write sound code.)
Also there's a lot less people knowledgeable of how to achieve such things reliably, because almost no one is interested in doing so. ↩︎
When I'm using lock I guess I'm locking the entire vector. Could I lock for a specific index of the vector, so only 2 threads will wait only if they try to write to the same place?
use std::sync::{Arc, Mutex};
use std::thread;
fn main() -> std::io::Result<()>{
let max_threads: usize = thread::available_parallelism()?.into();
let has_in: Arc<Mutex<Vec<u8>>> = Arc::new(Mutex::new(vec![0; max_threads]));
let mut threads: Vec<thread::JoinHandle<()>> = vec![];
for i in 0..max_threads {
let has_in = Arc::clone(&has_in);
threads.push(thread::spawn(move || {
let mut has_in = has_in.lock().unwrap();
has_in[i] = 1;
println!("{:?}", has_in);
}));
}
for thread in threads {
thread.join().unwrap();
}
Ok(())
}
It's a best-practice thing (or at least a habit of mine to be explicit about the scope of locks). The lock does drop automatically (like any other variable) at the end of the function, and in this case it didn't really matter. But in a more complicated function it can matter and/or be less obvious, e.g. as described in this issue or the last example in the documentation.
Note that you don't really need the Arc+Mutex for that. In fact you're just wrapping in a Mutex, then immediately calling lock to get a reference which you'll hold until the end of the program.