Beginner CPU-based 3D Renderer

I've been working on a 3D renderer in my free time to better understand graphics (and practice Rust).

Repository link

I'm looking for feedback on project structure, algorithm efficiency, and anything else that stands out to you.

Some notes in case you want to run the program:

  • Screen width / height is configurable at the top of main.rs (current default is 2560x1600)
  • Scale is similarly configurable. Higher value = more downscaling (default = 2)
  • Run in release mode for best performance, although if compiling takes too long, you can change codegen-units in Cargo.toml to be a higher number

Thanks for checking out my project! All feedback is appreciated.

In your camera.rs, you're using Mat4::look_to_rh to construct a matrix from a direction vector, which you constructed from rotation angles. This round-trip throws out information unnecessarily and becomes numerically unstable and then undefined at the poles (looking directly up and directly down), which is presumably why you wrote

self.rotation.x = self.rotation.x.clamp(0.99 * -FRAC_PI_2, 0.99 * FRAC_PI_2);

Instead, use Mat4::from_rotation_x, Mat4::from_rotation_y, and matrix multiplication to directly construct a rotation matrix from your rotation angles. Then, you'll be able to have a full ±π/2 (±90°) of pitch angles available, and when you need direction vectors you can actually pull those out as the column vectors of the matrix.


In draw.rs,

pub struct Draw {
    length: usize,
    to_be_drawn: Vec<([usize; 2], [u8; 3])>,
    depth_buffer: Vec<f32>,
}
  • length seems like it would be better named width.

  • to_be_drawn has a complex type which you have to read further down to understand. How about making it more documented and in smaller pieces, with some type aliases?

    type Point = [usize; 2];
    type Rgb = [u8; 3];
    pub struct Draw {
        width: usize,
        to_be_drawn: Vec<(Point, Rgb)>,
        depth_buffer: Vec<f32>,
    }
    
  • It's quite wasteful (in memory and in computation time) to store a Vec of pixels with coordinates. It would be much more efficient to pass the pixels.frame_mut() into your drawing routines and write directly to that frame as you compute the pixels of each triangle. You might choose to make this part of Draw, meaning it is a temporary struct created each frame instead of a persistent one, and the depth buffer would be owned separately:

    pub struct Draw<'a> {
        width: usize,
        color_buffer: &'a mut [u8],
        depth_buffer: &'a mut [f32],
    }
    

In main.rs,

  • there is some code which is about triangle math and some code which is about windowing. I would suggest moving the math parts to a new module.

  • clip() could be written to do its work purely in-place instead of allocating a second vector every time it’s used, perhaps by looping like for i in (0..current.len()) so that additional triangles can be pushed to the end while the loop iterates over only the original ones (modifying one and pushing up to two more). Then its signature could be made

    fn clip(points: &mut Vec<([Vec4; 3], Vec3A)>) {...
    

    and if the input vector is reused, it would allocate nothing in the steady state.

  • In App::handle(), the *dx and *dy can be avoided by mentioning the reference in the pattern instead:

    &Event::DeviceEvent {
        event: DeviceEvent::MouseMotion { delta: (dx, dy) },
        ..
    } => self.camera.update_rotation(vec2(-dy as f32, dx as f32)),
    

    This makes no difference to the behavior of the program, but I find it much more elegant.

Finally, I think the code could overall use more comments explaining what it's doing — not reiterating the code, but summarizing it and giving rationales for various choices. I have enough background in computer graphics to understand most of the intent, but someone less experienced would be lost trying to build on this code.

7 Likes

Thanks for taking the time to give me feedback! I really appreciate learning from an experienced graphics programmer.

Besides the changes you mentioned for the draw queue and the clip function, is there anything else you’d recommend to maximize performance? Other than the obvious choice of switching to GPU, that is. Eventually, I plan to add support for textures/better lighting/postprocessing, but I want to make sure I’m optimizing the base before I complicate the code further.

The biggest thing I noticed already that will likely lead to immediate performance improvements is getting rid of the list of pixels and writing directly to the frame buffer.

The second thing you will want to do is improve the efficiency of your triangle-drawing loop. In particular, I notice that you're iterating over the triangle’s bounding box. This is wasteful, and there’s a simple trick to avoid it: every triangle can be split into either 1 or 2 triangles such that it has one horizontal edge and two diagonal edges, and then you can iterate over such a triangle tightly without any unnecessary iterations:

for y in y_min..y_max {
    for x in compute_x_range(..., y) {
        // set pixel (x, y)
    }
}

where compute_x_range is actually just a lerp between the range at the top and the range at the bottom (one of which is a point and one of which is not).

After those two algorithmic improvements, what’s left will be generic performance optimization work: examining how the code actually performs and improving it.

  1. Profile, to determine what parts are unexpectedly slow.
  2. Benchmark, to determine whether changes you make are actually improvements.
  3. Examine the assembly, to look for opportunities for improvement.

In particular, for step 3, you'll be looking to see whether your pixel-setting loop is taking advantage of loop unrolling and SIMD operations — things that get the maximum throughput from your CPU. (Throughput — or in triangle rendering speak, “fill rate” — is going to be your bottleneck, and the main thing that “not using a GPU” costs you.) The compiler may or may not be able to auto-vectorize (convert your loop to SIMD) implicitly; if it does not, then you can try using explicit SIMD operations.

But don't put the cart before the horse and start tweaking things first; first you need a benchmark so you can measure the results, and to read a profile so you know that you're working on the right area of the program.

3 Likes