Shared mutable access to a list of values

Hello all,

I'm writing a toy renderer to learn a bit about Rust. So far, I'm generating my image in a Vec, where Color is a simple RGB triplet of f64.

I'm trying to parallelize the rendering by splitting it into tiles of 32x32 pixels. The worker threads simply pick up an unrendered tile when they're finished with the previous one.

I'm having trouble writing the results from the threads. Each threads needs to write parts of the result [1024 pixels for each tile], but the Vec cannot be shared mutably across threads.

In theory, it is largely fine - each thread ends up being responsible for different parts of the generated image. However, I don't know how to represent that properly in Rust.

Any idea which kind of structure or construct can I use to represent the image in memory, while still allowing multiple threads to write to it?

[In general I want to reduce moving data around uselessly, so copying the tiles is not really a good option - I would prefer to write them in place directly.].

There's split_at_mut(index) which gives two independent mutable slices.

There's chunks_mut() which gives an iterator of such slices.

And finally there's Rayon's par_chunks_mut() which operates on chunks of a Vec all in parallel.

(I've been informed I suck at memes and the order should be reversed)

5 Likes

I'll second this. It'll greatly simplify your code, and take care of the worker thread (work stealing) management. Something like:

use rayon::prelude::*;

let mut pixels: Vec<Color> = ...;
pixels
   .par_chunks_mut(1024)
   .for_each(|chunk| 
         // `chunk` is a `&mut [Color]` slice of length 1024
    )
...
1 Like

Note however that par_chunks_mut(1024) will give you 1024 consecutive values, which probably isn't the 32x32 tile you asked for.

To achieve that tiling, you're first going to need some representation that holds a sort of "slice" of non-contiguous ranges in your Vec, so you can split into independent parts for each thread. If you don't already have anything like this, I suggest using ndarray which can give you mutable views on arbitrary splits of your data.

Then to parallelize this with rayon, you still have to tell it how to do these splits. ndarray does have an ndarray-parallel extension crate for rayon (which will be folded into ndarray proper on the next release), but it doesn't (yet) include a parallel version of their exact_chunks_mut that would make your life easy. But you could collect those tile references first and then iterate that in parallel, like:

let tiles: Vec<_> = data.exact_chunks_mut([32, 32]).into_iter().collect();
tiles.into_par_iter().for_each(|tile| /* do your rendering */);

If you don't want such an intermediate, you could use ArrayViewMut::split_at to progressively chop down the axes. You could then either write your own tiling ParallelIterator for your case, or use the split helper function. Or if you don't care about dressing it up like an iterator at all, you can just do your own splits and call join.

1 Like

A simpler idea: you could just parallelize chunks of 32 entire rows, which should be contiguous, and then deal with 32 columns at a time in serial.

vec.par_chunks_mut(32 * num_columns).for_each(|rows| {
    // we have a 32xN chunk, so now render each 32x32 tile within
});
2 Likes

Yeah, maybe. I was naively imagining the 32x32 tiles were flattened linearly into 1024 contiguous pixels, but perhaps that’s not the actual layout.

Thanks a lot everyone! I'll look into the various suggestions. Rayon and ndarray look particularly promising.

I was indeed thinking of square tiles - it does not matter that much as it is a toy project but in an actual renderer it is better (compared to do full lines for example) as it helps with locality - most rays tend to hit the same objects.

1 Like