Almost no multithreading while using crossbeam channels + scoped threadpool with minifb

Hello, last time I was able to set up multithreading in my application with the awesome community's help using crossbeam channels with scoped_threadpool.

At that point, the code looked like:

let pool = Pool::new(cpus);
        let (s, r) = unbounded();
        let tiles : Vec<Tile> = Default::default();
        let pixel_numbers = 0..(film.height * film.width);
        pool.scoped(|scope| {
            for i in pixel_numbers.step_by(TILE_SIZE) {
                let sender = s.clone();
                scope.execute(move || 
                     { let tile : Tile = Default::default();
                        sender.send(tile).unwrap();
                    }
                });
            }
        });
        drop(s); //To avoid waiting for the initial s which does not do anything
        tiles.extend(r);

And it worked beautifully. To the best of my understanding, after the tiles.extend(r) line, all the receivers are done and we have the output. This was around 850 ms.

Now, I am trying to push the Tiles into a framebuffer to display on the screen as they are done, and it seems the minifb crate is extremely easy to use, so I did that instead of collecting the output in tiles as and when they are done:

//tiles.extend(r2);
        while window.is_open() && !window.is_key_down(Key::Escape) && !r.is_empty() {
            // Receive tile from renderer without blocking
            let finished_tile_result = r.try_recv();
            match finished_tile_result {
                Ok(finished_tile) => {
                    // Now that we have the tile from the renderer, push it into tiles for image
                    // and push it to display buffer, tone mapping and showing to the screen
                    tiles.push(finished_tile.clone());
                    frame_buffer.splice(
                        finished_tile.start_index as usize
                            ..(finished_tile.start_index as usize + finished_tile.num_pixels),
                        do_tonemapping(finished_tile.pixels),
                    );

                    //dbg!(finished_tile.num_pixels);
                    // We unwrap here as we want this code to exit if it fails. Real applications may want to handle this in a different way
                    window
                        .update_with_buffer(&frame_buffer, film.width as usize, film.height as usize)
                        .unwrap();
                }
                Err(_) => {}
            }

However, this makes the program an order of magnitude slower, the CPU is never fully utilized, and what's worse, the time depends on how fast the screen can update, so if I limit the update rate to say 60 fps, the time taken is even worse.

So it seems I am ending up being synchronized to the while loop... Even though, from what I understand, the loop should run in the main thread while the receivers work on their respective Tiles, and once the result is done, try_recv() would get a value which would be subsequently updated in both the display frame_buffer Vec<u32>, and tiles. This should at most use one thread and should not block, yet there seems to be something weird going on.

Can anyone please advise on what's going horribly wrong here, and maybe ways to fix it?

Also, I had another idea that instead of waiting to update the frame_buffer whenever a receiver is done, we keep it totally independent of the channels, and instead draw to the window at say, 60 fps by reading the tiles instead of writing to a separate frame_buffer. However, I am still not completely sure how to implement that in the current channels framework... Is it something that can be implemented easily?

Any advice would be appreciated! Thanks!

1 Like

Pool::scoped blocks until all its threads are completed as you can see in the documentation. I guess you have to put the drawing logic into the closure passed to scoped, so that it can be run concurrently with spawned threads.

1 Like

Ah, I totally missed out observing that part, thank you!

I am still trying to figure out where and how to put the drawing code. However, just for the sake of it, I tried replacing the scoped threadpool with a non-scoped threadpool. The thing is, there is an extremely big Scene object inside the closure that also has Trait objects, and it seems people usually surround it inside an Arc and make a clone of the Arc for the closure. But, doing that slows the program 50x, worse than using scoped_threadpool (the slowdown was 10x over there).

Still have long ways to go. :slight_smile:

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.