UnsafeCell Sync

I have this code:

    let result_vec = Vec::<ResultEntry>::with_capacity(index.projects.len() as u32 as usize);

    index.projects.par_iter().for_each(|p| {
        let res = ResultEntry {
            source_project: p.source_project.clone(),
            res: Vec::new(),
        };
        let res_unsafe_mut:&mut Vec<ResultEntry> =unsafe{
            std::mem::transmute(&result_vec)
        };
        res_unsafe_mut[p.number as usize] = res;
        counter.fetch_add(1, Ordering::SeqCst);
    });

Rust (reasonably) screams to me about how unsafe this is and refers to using UnsafeCell, however UnsafeCell is not sync, and I need it to be. I'm reasonably certain that since the index the thread is modifying is known to be unique, there should be no data races--every thread writes to different parts of memory. I could disable this warning, but is there a more idiomatic way to express this?

It's likely not just a warning, but a hard error.

And for good reason. Your code actually exhibits instant UB by transmuting an immutable reference to a mutable one, as well as by causing mutable aliasing (creating multiple simultaneous &mut references, one in each thread).

I see you are already using Rayon for high-level parallel iteration. Is there a reason you don't simply sort index.projects by .number, and zip it with result_vec.par_iter_mut()?

Better yet, don't write to a shared-memory vector. Just return the ResultEntry from the callback closure, replace for_each() with map(), and finally collect() the results (and sort them if needed).

2 Likes

I know it's UB, but I can structurally guarantee that these threads only ever access unique indices. The error is caused by the #[deny(mutable_transmutes)]

That's not the point. What is UB is you have overlapping mutable references to the result vector; I am not talking about elements.

I know, but the actual memory writes within that vector are known to be non overlapping->no data race. Is there a way to express this guarantee without unsafe? I tried your way, and it works! But I imagine there are scenarios where having to have shared mutable access across threads in this manner is necessary/more performant.

Safety rules of references applies for the whole time the reference exists, not just when it is used. So having two references to the same object is UB even if you only use one of the references. The compiler is allowed to optimize code assuming this never happens, which can lead to undefined behavior when it does.

3 Likes

You can avoid some of these requirements by using raw pointers, but there are still rules for how you can alias that I'm not sure I understand 100%.

For your use case, IIUC, just filling up the array from an iterator is the canonical way to achieve what you want. You don't need any unsafe, at the compiler will almost certainly elide the bounds checks meaning your binary will be identical to if you had written it in unsafe by hand.

The way to statically express this without unsafe is to split up the vector into non-overlapping mutable borrows (e.g. iter_mut() or split_at_mut()) and write the elements through those: Playground

use std::sync::atomic::{AtomicU32, Ordering};
use rayon::prelude::*;

#[derive(Default, Clone, Debug)]
struct SourceProject;

#[derive(Default, Clone)]
struct ResultEntry {
    source_project: SourceProject,
    res: Vec<u32>,
}

#[derive(Clone, Debug)]
struct Project {
    number: u64,
    source_project: SourceProject,
}

struct Index {
    projects: Vec<Project>,
}

fn main() {
    let mut index = Index {
        projects: vec![],
    };
    let mut result_vec: Vec<ResultEntry> = vec![ResultEntry::default(); index.projects.len()];
    let counter = AtomicU32::new(0);

    index.projects.sort_by_key(|p| p.number);
    index.projects.par_iter().zip(result_vec.par_iter_mut()).for_each(|(p, out)| {
        *out = ResultEntry {
            source_project: p.source_project.clone(),
            res: Vec::new(),
        };
        counter.fetch_add(1, Ordering::SeqCst);
    });
}

But this is unnecessary complication, given that .map().collect() can do the same, without needing to manually pre-allocate the results, zipping them, and assigning through the references.

TBH I find that highly unlikely.

3 Likes

If you need to fill the vector from multiple threads concurrently, then Mutex<Vec<_>> will almost certainly be good/fast enough. If it is not there are excellent concurrent data structures on crates.io. But I wouldn't reach for them unless you need them.

But using a mutex, though in general would be necessary, is not necessary here and would encounter extraneous overhead of syncing. We know the size of the allocation is not changing, and each thread knows exactly what region to write to without overlap.

One correct way to implement this unsafely can be found here.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.