Rust / wgpu / replace depth buffer

I have a Rust / wgpu program as follows:

  1. we have 3 buffers:
color_attachment: vec<u8> of size width * height
depth_attachment: vec<f32> of size width * height
aux: vec<f32> of size width * height
  1. as part of normal rendering, the fragment shader writes to
color_attachment: computed in frag shader
depth_attachment: unmodified; we don't do anything to this
aux: we also write an extra f32 per pixel
  1. now, after we render the scene, I want to REPLACE the depth_attachment with aux

Is this possible ?

pre-emptive questions:

why don't you overwrite depth in frag shader ?

I want triangles to be culled by depth. the value written to aux is a small modification of depth, which does not change sorting (relative to objs rendered by this shader); it only matters relative to objs rendered by other shaders

thus for this particular pipeline; I want to do depth sorting/culling by vertex shader outputted depth, but after this pipeline, before the next shader runs, I want to replace the depth_attachment with the aux buffer (the real depth values)

you can render a quad and only store z-values. It's a common practice to have a quad to do such passes.

I am not sure I am understanding your suggestion correctly. Is the following correct ? (If not, where am I going wrong?)

So my problem is:

1. run_shader_a
2. need to replace depth buffer with aux_output from shader_a
3. run shader_b

You are suggesting for step 2, I just draw a really big quad that covers the screen. This quad does not write to the color buffer. This quad only draws to the depth buffer.

Then, we pray that in the fragment shader, we line up the depth_buffer and aux_output line up perfectly ?

This seems like a very convoluted way to do

gpu_mem_cpy(dst = depth_buffer, src = aux_output); // if such a function existed.

GPUs are so parallel and asynchronous that they really dislike doing stuff that requires synchronizing and blocking the whole world while data is slowly and linearly copied from a buffer to another. They can also copy stuff between buffers very quickly and in parallel, but that's called "rendering". The quad method may literally be the most natural way to do it from the GPU perspective. If you're using the graphics pipeline, that is. With compute shaders you have more latitude on how to move your data around.


Is there sample code I can crib for this ?

Given the problems I have been running into with texture sampling (blurring, off by 1, etc ...); I do not trust my current abilities to get this pixel perfect. (And screwing up the depth buffer would be really bad.)