Dataparallel for loop

Hey Rustaceans,

I hope you can help me.

I've finally found the time to dive into Rust and so far I'm liking it a lot. For my first project I choose to port smallpt, a tiny very basic path tracer, from C++ to Rust.
By now I've successfully ported the whole thing, but I'm stuck on making it run on multiple processors. The path tracer essentially loops over all pixels and calculates them one at a time, so

for y in 0..height {
    for x in 0..width {
        let pixel_index = x + (height - y - 1) * width;
        <render pixel > } }

and I would like to make it something along the lines of

for y in 0..height {
    parallel for x in 0..width {
        let pixel_index = x + (height - y - 1) * width;
        <render pixel> } }

The pixel itself is just a POD struct with 3xf64 and the scene itself is completely immutable.

I've tried reading the introduction for rust, but didn't find anything specifically targeting dataparallel. I've also googled the issue, but all the posts I find are two years old so not really written in the same Rust that we have today. :wink:

Could anyone guide me towards a solution?

Best
Asger

Write the code in OpenCl and just call it from rust :wink:

Rust is not really made for this kind of parallelization. Instead of executing for-loops in parallel (opencl is perfect for that) it has the capability to run different tasks and let them communicate efficiently.
Most rendering algorithms split the whole image into a few blocks (you can use lines for simplicity) and then render these.
You can use ScopedThreadpool from the threadpool crate.

1 Like

It's not true that Rust is not made for this! std::thread::scoped should handle this very well.

I'm sure there are others that can do this better than me, but please see my example (and give improvement suggestions).

Push thread creation to the top level, because these are not lightweight. Sharing read-only data is effortless, and partitioning out the correct part of the mutable result array is done using the .chunks_mut() iterator.

Of course the inner loop should use iterators over the slices as well, but this is just an example.

2 Likes

Yes, but there is no keyword or function to automatically run loops in parallel with a balanced amount of threads like openMP or C++'s experimental for_each do it.

There's no built-in function, but the intention is that the language provides the tools required to write good libraries for it, e.g. I have simple_parallel that serves as a sketch/proof of concept of a few basic patterns. The thread pool in that library provides a way to distribute tasks over a limited number of threads, e.g. one might write the code in the OP (distributing work over 4 threads) as:

let mut pool = simple_parallel::Pool::new(4);

pool.for_(0..height, |y| {
    for x in 0..width { ... }
})

Either way, Rust is actually designed to be really great for doing this sort of data parallelism well. E.g. my simple_parallel library can operate on essentially arbitrary iterators (as long as the elements are safe to send between threads), and, is free from data races and memory unsafety. (Well, I haven't verified the unsafe code in detail, but I'm pretty sure it's OK.)

4 Likes

std::thread::scoped is the basic building block that enables this properly, and it hasn't been in place in this capacity in Rust for more than a few weeks! So the tools around that will come.

Fantastic! Thanks for all the replies and the examples. I'll look them over and give feedback where appropriate. I figured I'd have to use a scoped thread in some way, but couldn't wrap my head around attempting that + starting out with Rust's data sharing concepts.
Regarding using OpenCL. The whole point of the current exercise is to learn Rust and learn some nice parallel memory handling practices that I can apply in C++ at work, so using OpenCL instead would be counter-productive to that goal. :smile: Dataparallelism is also something that I use for practically all of my projects, so having a solid understanding of how to do it in Rust is crucial going forward.

This was exactly what I needed. Thank you!
For beginners though you could change one thing. On l. 16 instead of looping from 0 to chunk_length, loop from 0 to result_chunk.len(), to make it clear to users that the result chunk actually has a size and that chunks_mut is not guaranteed to return a slice of size chunk_length. That it makes the code more robust to changes is just icing on the cake. :wink:

1 Like

That's a great improvement :slight_smile: