Not via WGPU. You only get one queue.
Was this meant to be a reply to my post over here?
Assuming so, I think you can still request a CommandEncoder and fill it per thread, then submit them together at the end? I could easily be missing something.
This rough pattern seems to work fine?
// so we know how many threads there are up front
let threads = rayon::ThreadPoolBuilder::new()
.num_threads(2)
.thread_name(|index| format!("Renderer Worker {}", index))
.build()
.unwrap();
// ... on render
// create encoder per worker thread
let mut encoders: Vec<_> = (0..self.threads.current_num_threads())
.map(|index| {
Mutex::new(
wgpu.device
.create_command_encoder(&CommandEncoderDescriptor::default()),
)
})
.collect();
self.threads.scope(|scope| {
// for example, in reality you would use scene.objects().par_for_each() or the like
scope.spawn(|scope| {
// Lock the current threads encoder.
// This should never block, so you could use UnsafeCell even...
let mut encoder = encoders[self.threads.current_thread_index().unwrap()]
.lock()
.unwrap();
// do some rendering....
});
// other rendering...
anyhow::Ok(())
})?;
wgpu.queue.submit(
encoders
.into_iter()
.map(|encoder| encoder.into_inner().unwrap().finish()),
);
image.present();
Ok(())
That just lets you feed items into the queue from two sources, not do them simultaneously.
What I want to do is overlap asset loading into GPU memory with rendering. That needs a transfer queue separate from the main queue. Right now, asset loading slows rendering by 2x-3x.
This wis with the assumption that the majority of the CPU-side work is supposed to be in the command encoding APIs e.g. pass.set_bind_group()
etc., which was probably a little silly.
If it's instead in encoder.finish()
try something like
let commands: Vec<_> = objects.par_chunks(chunk_size)
.map(|chunk| {
let encoder = wgpu.device.create_command_encoder(...);
for object in chunk {
// render ...
}
encoder.finish()
})
.collect();
wgpu.queue.submit(commands);
Or if it's in .submit()
you'll need to figure out ordering, but otherwise this works:
objects.par_chunks(chunk_size).for_each(|chunk| {
// ...
wgpu.queue.submit([encoder.finish()]);
});
Unless wgpu is re-implementing command buffers on the client side or completely screwing up locks (very possible), that should be doing (CPU-side) rendering work in parallel.
GPU side parallelism is more about declared resource dependencies from what I understand, I don't know how good wgpu is at doing that for you (probably not great?)
But yeah, without the ability to query a separate queue you can't address transfers at all. Sucks.