Consuming dynamically sized mutable subslices with multiple threads

I have a large vector of some datatype. I need to perform some computation on each entry, altering the entry each time. Therefore, the input is a mutable slice of elements. To speed up the computation, I have parallelised it. Specifically, each thread takes a chunk of elements from the "queue" and then performs the computation on each element. The chunk size is determined dynamically.

Now the question is, how can a thread take a chunk of elements out of the input slice. Consider the following code (which does not compile):

extern crate crossbeam;
use std::sync::Mutex;

fn main() {
    let mut data = vec![0; 1024];
    let shared_data = Mutex::new(data.as_mut_slice()); // a large amount of data

    crossbeam::scope(|scope| {
        for _ in 0..4 {
            scope.spawn(|_| {
                loop {
                    let chunk_size = 10; // dynamically computed, different each iteration
                    
                    let local_data: &mut [i32] = {
                        let mut locked_shared_data = shared_data.lock().unwrap();

                        let (local_data, remaining_shared_data) = 
                                locked_shared_data.split_at_mut(chunk_size);
                        *locked_shared_data = remaining_shared_data;

                        local_data
                    };

                    // perform computations, reading from and writing to local_data
                    drop(local_data);
                }
            });
        }
    }).unwrap();
}

(Playground)

Errors
   Compiling playground v0.0.1 (/playground)
error[E0597]: `locked_shared_data` does not live long enough
  --> src/main.rs:18:33
   |
6  |     let shared_data = Mutex::new(data.as_mut_slice()); // a large amount of data
   |         ----------- lifetime `'1` appears in the type of `shared_data`
...
18 |                                 locked_shared_data.split_at_mut(chunk_size);
   |                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |                                 |
   |                                 borrowed value does not live long enough
   |                                 argument requires that `locked_shared_data` is borrowed for `'1`
...
22 |                     };
   |                     - `locked_shared_data` dropped here while still borrowed

error[E0499]: cannot borrow `locked_shared_data` as mutable more than once at a time
  --> src/main.rs:19:26
   |
18 |                                 locked_shared_data.split_at_mut(chunk_size);
   |                                 ------------------------------------------- first mutable borrow occurs here
19 |                         *locked_shared_data = remaining_shared_data;
   |                          ^^^^^^^^^^^^^^^^^^ second mutable borrow occurs here
20 | 
21 |                         local_data
   |                         ---------- first borrow later used here

Some errors have detailed explanations: E0499, E0597.
For more information about an error, try `rustc --explain E0499`.
error: could not compile `playground` due to 2 previous errors

The input mutable slice is shared with a mutex between the threads. Whenever a thread needs more data, it locks the mutex and takes a subslice of the data. To avoid creating duplicate mutable references, it leaves only the remaining slice in the mutex. Example:

initial mutable slice inside the mutex:
shared_data == 0123456789

take a mutable subslice of length 3 out of the data:
shared_data == 3456789
local_data == 012

The thread can now work on local_data, and no other thread can get mutable references to it, since shared_data now points to a non-overlapping slice.

However, the compiler does not seem to be convinced by this. I get that the first error means that just because we overwrite the value inside the mutex, we cannot alter its associated lifetime. The second error is a bit confusing, since the "second mutable borrow" is actually overwriting the value inside the variable, as opposed to borrowing it?

Is there any way I can make this code compile in safe Rust? Or if not, in unsafe Rust?

The problem is that the data behind a mutex can only be accessed through a MutexGuard, which DerefMut's to the inner data. However, the signature of DerefMut::deref_mut() is

fn deref_mut(&mut self) -> &mut Self::Target

which, combined with lifetime elision rules, means that the lifetime of the resulting reference is tied to the lifetime of self, i.e., it's tied to the lifetime of the mutex guard. Since mutable references are invariant in the lifetime, you can't put a &mut 'long T mutable reference back through a &'short mut &'long mut T.

But you do need to mutably borrow the place in order to overwrite its value, so the outer layer of reference is borrowed mutably. (You have a mutable reference to the value, you don't have it by-value.)


Is your item type actually a primitive integer? If so, you could just replace it with its atomic counterpart and hand out shared references, working around variance. Also, you don't need extern crate anymore. Accordingly, this compiles and runs. (I also fixed an OOB indexing panic.)

1 Like

Consider using rayon, which already supports subdividing work on a mutable slice.

2 Likes

Thanks! Actually, the write part of my values is just boolean, so using atomic there is simple. I think it is a good idea.

On a second thought, it obfuscates the code a bit, since atomic would allow multiple threads to read and write the same element safely, which never actually happens. But still for my purposes totally enough.


Well actually, out of interest, if I would use pointers, instead of a slice, and then build the slice local_data manually from pointers like this:

use std::sync::Mutex;

fn main() {
    let mut data = vec![0; 1024];
    let shared_data = Mutex::new((0, data.as_mut_slice())); // a large amount of data

    crossbeam::scope(|scope| {
        for _ in 0..4 {
            scope.spawn(|_| {
                loop {
                    let chunk_size = 10; // dynamically computed, different each iteration
                    
                    let local_data: &mut [i32] = {
                        let mut shared_data_locked = shared_data.lock().unwrap();
                        let (offset, slice) = &mut *shared_data_locked;

                        let local_len = chunk_size.min(slice.len() - *offset);
                        if local_len == 0 {
                            break;
                        }
                        
                        let local_data = unsafe {
                            std::slice::from_raw_parts_mut(
                                slice.as_mut_ptr().add(*offset),
                                local_len,
                            )
                        };
                        
                        *offset += local_len;

                        local_data
                    };

                    // perform computations, reading from and writing to local_data
                    drop(local_data);
                }
            });
        }
    }).unwrap();
    
    println!("Finished");
}

(Playground)

This would not introduce undefined behavior, or would it? All the mutable slices are still non-overlapping, and the pointer in the mutex is never dereferenced.

Thanks! Last time I checked rayon couldn't do dynamic chunking, i.e. let a thread decide how large its next chunk should be. So now I have built everything without rayon and instead manually with crossbeam already.