Threading via WGPU

Not via WGPU. You only get one queue.

Was this meant to be a reply to my post over here?

Assuming so, I think you can still request a CommandEncoder and fill it per thread, then submit them together at the end? I could easily be missing something.


This rough pattern seems to work fine?

// so we know how many threads there are up front
let threads = rayon::ThreadPoolBuilder::new()
    .num_threads(2)
    .thread_name(|index| format!("Renderer Worker {}", index))
    .build()
    .unwrap();

// ... on render

// create encoder per worker thread
let mut encoders: Vec<_> = (0..self.threads.current_num_threads())
    .map(|index| {
        Mutex::new(
            wgpu.device
                .create_command_encoder(&CommandEncoderDescriptor::default()),
        )
    })
    .collect();


self.threads.scope(|scope| {
    // for example, in reality you would use scene.objects().par_for_each() or the like
    scope.spawn(|scope| {
        // Lock the current threads encoder.
        // This should never block, so you could use UnsafeCell even...
        let mut encoder = encoders[self.threads.current_thread_index().unwrap()]
            .lock()
            .unwrap();

        // do some rendering....
    });

    // other rendering...

    anyhow::Ok(())
})?;

wgpu.queue.submit(
    encoders
        .into_iter()
        .map(|encoder| encoder.into_inner().unwrap().finish()),
);
image.present();
Ok(())

That just lets you feed items into the queue from two sources, not do them simultaneously.

What I want to do is overlap asset loading into GPU memory with rendering. That needs a transfer queue separate from the main queue. Right now, asset loading slows rendering by 2x-3x.

This wis with the assumption that the majority of the CPU-side work is supposed to be in the command encoding APIs e.g. pass.set_bind_group() etc., which was probably a little silly.

If it's instead in encoder.finish() try something like

let commands: Vec<_> = objects.par_chunks(chunk_size)
  .map(|chunk| {
    let encoder = wgpu.device.create_command_encoder(...);
    for object in chunk {
      // render ...
    }
    encoder.finish()
  })
  .collect();
wgpu.queue.submit(commands);

Or if it's in .submit() you'll need to figure out ordering, but otherwise this works:

objects.par_chunks(chunk_size).for_each(|chunk| {
  // ...
  wgpu.queue.submit([encoder.finish()]);
});

Unless wgpu is re-implementing command buffers on the client side or completely screwing up locks (very possible), that should be doing (CPU-side) rendering work in parallel.

GPU side parallelism is more about declared resource dependencies from what I understand, I don't know how good wgpu is at doing that for you (probably not great?)

But yeah, without the ability to query a separate queue you can't address transfers at all. Sucks.