let result_vec = Vec::<ResultEntry>::with_capacity(index.projects.len() as u32 as usize);
index.projects.par_iter().for_each(|p| {
let res = ResultEntry {
source_project: p.source_project.clone(),
res: Vec::new(),
};
let res_unsafe_mut:&mut Vec<ResultEntry> =unsafe{
std::mem::transmute(&result_vec)
};
res_unsafe_mut[p.number as usize] = res;
counter.fetch_add(1, Ordering::SeqCst);
});
Rust (reasonably) screams to me about how unsafe this is and refers to using UnsafeCell, however UnsafeCell is not sync, and I need it to be. I'm reasonably certain that since the index the thread is modifying is known to be unique, there should be no data races--every thread writes to different parts of memory. I could disable this warning, but is there a more idiomatic way to express this?
And for good reason. Your code actually exhibits instant UB by transmuting an immutable reference to a mutable one, as well as by causing mutable aliasing (creating multiple simultaneous &mut references, one in each thread).
I see you are already using Rayon for high-level parallel iteration. Is there a reason you don't simply sort index.projects by .number, and zip it with result_vec.par_iter_mut()?
Better yet, don't write to a shared-memory vector. Just return the ResultEntry from the callback closure, replace for_each() with map(), and finally collect() the results (and sort them if needed).
I know it's UB, but I can structurally guarantee that these threads only ever access unique indices. The error is caused by the #[deny(mutable_transmutes)]
I know, but the actual memory writes within that vector are known to be non overlapping->no data race. Is there a way to express this guarantee without unsafe? I tried your way, and it works! But I imagine there are scenarios where having to have shared mutable access across threads in this manner is necessary/more performant.
Safety rules of references applies for the whole time the reference exists, not just when it is used. So having two references to the same object is UB even if you only use one of the references. The compiler is allowed to optimize code assuming this never happens, which can lead to undefined behavior when it does.
You can avoid some of these requirements by using raw pointers, but there are still rules for how you can alias that I'm not sure I understand 100%.
For your use case, IIUC, just filling up the array from an iterator is the canonical way to achieve what you want. You don't need any unsafe, at the compiler will almost certainly elide the bounds checks meaning your binary will be identical to if you had written it in unsafe by hand.
The way to statically express this without unsafe is to split up the vector into non-overlapping mutable borrows (e.g. iter_mut() or split_at_mut()) and write the elements through those: Playground
use std::sync::atomic::{AtomicU32, Ordering};
use rayon::prelude::*;
#[derive(Default, Clone, Debug)]
struct SourceProject;
#[derive(Default, Clone)]
struct ResultEntry {
source_project: SourceProject,
res: Vec<u32>,
}
#[derive(Clone, Debug)]
struct Project {
number: u64,
source_project: SourceProject,
}
struct Index {
projects: Vec<Project>,
}
fn main() {
let mut index = Index {
projects: vec![],
};
let mut result_vec: Vec<ResultEntry> = vec![ResultEntry::default(); index.projects.len()];
let counter = AtomicU32::new(0);
index.projects.sort_by_key(|p| p.number);
index.projects.par_iter().zip(result_vec.par_iter_mut()).for_each(|(p, out)| {
*out = ResultEntry {
source_project: p.source_project.clone(),
res: Vec::new(),
};
counter.fetch_add(1, Ordering::SeqCst);
});
}
But this is unnecessary complication, given that .map().collect() can do the same, without needing to manually pre-allocate the results, zipping them, and assigning through the references.
If you need to fill the vector from multiple threads concurrently, then Mutex<Vec<_>> will almost certainly be good/fast enough. If it is not there are excellent concurrent data structures on crates.io. But I wouldn't reach for them unless you need them.
But using a mutex, though in general would be necessary, is not necessary here and would encounter extraneous overhead of syncing. We know the size of the allocation is not changing, and each thread knows exactly what region to write to without overlap.