Writing disjoint slices and Reading whole slice in Multithreading

Hi everyone,

I am working on program that uses mutable ref to subslices of array in multi-threaded setting and reads the entire array in each thread.
Minimal example of the issue is below

use std::sync::{
    Arc, RwLock
};
fn main() {

    let my_len = 1024usize;
    let num_threads = 2;
    let chunk_len = my_len / num_threads;
    let mut v_vec = vec![0f32; my_len];
    let v_ref = v_vec.as_slice();
    let v_ref_mut = v_vec.as_mut_slice();

    // needs to be commented because of borrow checker
    // but I also need it to read the entire array
    // let z_rw = Arc::new(RwLock::new(v_ref_mut));
    // let pool_r = z_rw;

    // creating disjoint slices for threads to write on
    let (x,y) = v_ref_mut.split_at_mut(chunk_len);
    let x_rw = Arc::new(RwLock::new(x));
    let y_rw = Arc::new(RwLock::new(y));
    let pool_w = vec![x_rw,y_rw];

    // see barriers below to sync threads
    use std::sync::{Arc, Barrier};
    let barrier = Arc::new(Barrier::new(num_threads));

    std::thread::scope(|s| {
        for t_id in 0..num_threads {
            let c = Arc::clone(&barrier);
            let cur_pool = pool_w[t_id].clone();
            s.spawn(move || {
                c.wait();
                for i in 0..chunk_len {
                    cur_pool.write().unwrap()[i] += 2f32;
                }
                c.wait();
                // needs to be commented out bec of borrow checker
                // I want to have read access to the entire slice of v_vec
                // e.g. pool_r below if borrow checker allowed
                // println!("num {:?}", pool_r.read().unwrap()[my_len-1]);
            });
        }
    });
    // dirty checking the result
    println!("0, 512 th elem: {}, {}", v_vec[0], v_vec[512])
}

As you see above, I have 2 sections inside the multi-threaded part:

  1. writing on disjoint subslices of vec v_vec in each thread
  2. having read access to the whole slice of v_vec in both threads

which are synchronized with barriers between them.

I can solve the issue with using Ptr wrapper struct and implementing send for MyPtr(*mut T). This works ok since I make sure that the slices are disjoint and synchronization is done with barriers.
But, I want to avoid using unsafe if possible. Any idea about safe implementation ?
Thanks

In principle, what you need is a RwLock style lock wrapped around v_vec that allows you to lock it for writing and then also supports a split_at_mut() type operation on the lock guard, so that all threads can write their parts, and then acquire a read lock when all writing is done. However, I'm not aware of any lock library that provides such a RwLock with multiple simultaneous (counted) write guards.

Assuming you can't find one based on that suggestion, the way I would suggest organizing your unsafe code is to get as close to that as you reasonably can. For example, write a struct which combines a barrier and a pointer to the vector contents so that the only way to get the read access is to call a function that will wait for the barrier too. That's not a complete safe abstraction, since it does not manage the writing, but it's illustrative of the direction you'd move in to achieve that.

1 Like

Thanks alot, that seems to be solution unless someone else provides completely safe way/crate proposes.

I think I will provide safe api around the logic you said in your first paragraph. Since it is small, I can efficiently test it in various scenarios with Miri.

Quick google search shows GitHub - austinstig/range-lock: A range lock implementation in Rust
The code looks like it has both read and write locks.

1 Like

Also, here are some tips to write simpler and more efficient code that are unrelated to the synchronization/borrowing problem you actually asked about:

  • Because you are using scoped threads, you do not need any Arcs. Each Arc<T> can be replaced with just &T. (Arc is only needed when there isn't a scope, legible to the borrow checker, that everything is guaranteed to be unused after.)

  • You do not need to wrap x,y in Arc<RwLock>s to deliver them to the individual threads — you can have your thread-creating loop iterate over them, so they're moved into each thread.

Applying both of these to your code:

    ...
    let (x,y) = v_ref_mut.split_at_mut(chunk_len);
    let pool_w = vec![x,y]; // contains all the refs and is iterable

    let barrier = Barrier::new(num_threads); // no Arc needed

    std::thread::scope(|s| {
        for cur_pool in pool_w {  // iterate over mut refs instead of id numbers
            s.spawn(|| { // closure doesn't need to be `move`
                barrier.wait(); // directly using `&Barrier`
                for i in 0..chunk_len {
                    cur_pool[i] += 2f32; // directly using `&mut [f32]`
                }
                barrier.wait();
                ...

And if you still needed thread IDs, you could get them like for (id, cur_pool) in pool_w.enumerate().

1 Like

Thanks for the advice.
I reproduced this small example from original by semi-copying :slight_smile: There, in some cases (depending on input parameters) the same rwlock has to be sent to multiple threads.
Also, it used as part of another struct.

I will remove the arc on barrier

Edit: Removed all Arc. I guess I was just too lazy to deal with lifetime parameter notations :slight_smile:

Thanks, I will check it out. There is another project with name range-lock (published on crates.io)

Both crates have either read or write locks, but not RwLock.

Since I need to have read access from multiple threads after writing is done, these seem not useful for my case. (Since mutex lock when writing is held only on 1 thread)

I found another that has RwLock for interval: crates.io: Rust Package Registry

But, this claims to require unstable features even though the features I looked at are already stabilized.

For the time being, I will write my own solution following these crates (adding some testing and miri)

Another possible way to organize your code would be to, instead of having each individual thread give up its write access and gain read access, start new threads (or new tasks in a thread pool like rayon) in a new scope to perform the read operations. If the read tasks need per-task information from the write tasks, that information could be carried forward using a channel or a vector of OnceLocks.

This way, you don't need any locking or explicit synchronization at all, let alone custom locking.

If choose this method (which I tried), thread handling in my code becomes too complex because it will have to be interleaved inside bunch of loop since parallelism is used in inside of multiple (nested) loops.

Also, I need to manage cache locality and keep the same slice of in the same thread.
The only I could think of doing this is the map ids of thread (of threadpool) to ids sub-slice of array.

I prefer additional implementation of RwLock-Interval over these complexities and performance issues.

But if I start the thread once and stay in the same thread for the entire run of function, the only thing I need to do is to use a function (which is based on some chunk_len and total input dims) to calculate appropriate index of each slice given and use those indices for the entire function

I tested the performance issues with benchmarking.

Simpler version of my main code (in pseudocode):

fn f (...) {
loop1 i0 .. i1 {
   f1(...)
  loop2 j0 .. j1 {
     f2(..)
  }
}
}

In the current method, essentially only thing I need to calculate i0,i1,j0,j1 and give it to the sequential version of f to thread::spawn.

If I were to use threadpool, I'd have to insert threadpool.execute for each function used say f1,f2. Also, f2 in my case uses parallelism in 2 dimensions. So, adding those additional loops (+ mapping slice index to thread id) to manage threadpool to parallelize each function inside the loops is less desirable than using indices and staying in the same thread.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.