(After optimization) Are these 3 patterns sort of equivalent for the compiler?

Hello all,

First of all, thanks for all your help! This is becoming a series now haha (check previous posts HERE and HERE )

So, I have been trying to develop a ray-tracer for my building simulation tool... but it is very slow for now. In an attempt to optimize it, yesterday I did something that changed things quite dramatically, let me explain briefly (check HERE for details).

So, ray-tracing involves calculating tons of ray/primitive intersections, of course. Hopefully, these primitives will be stored in an array.

Options 1a and 1b: The object-oriented approach

Since these Primitives are ACTUAL objects, I thought on using an object-oriented approach, so I could do something like:

let objects = Vec<Primitive>; // or Vec<dyn Primitive>
let ray = Ray::default();
for o in objects{ 
     let pt = o.intersect(&ray);
    //.... do something with this.
}

In my view, there are two approaches: Traits or Enums.

Enums

Enums allow static dispatch but don't make things very maintainable:

enum Primitive{
    Sphere(SphereData),
    Triangle(Triangle),
    //...
}

impl Primitive{
     pub fn intersect(&self, ray: &Ray)->Point3D{
            match self {
                   Self::Sphere(s)=>s.intersect(ray),
                   Self::Triangle(t)=>t.intersect(ray),
                  // ...
            }
     }
}

// And then I need to
impl SphereData{
     pub fn intersect(&self, ray: &Ray)->Point3D{...}
}
impl Triangle{
     pub fn intersect(&self, ray: &Ray)->Point3D{...}
}
/* ... and so on... but this is harder to maintain, as I need to add 
       manually all the implementations for any new method for the 
       Primitive enym
*/

Traits

Traits are cool because they ensure that all the functions in the different primitives have
the same signatures and names. However, this involves dynamic dispatching.

trait Primitive {
     fn intersect(&self, ray: &mut Ray)->Point3D;
}

impl Primitive for Sphere
     fn intersect(&self, ray: &mut Ray)->{...}
}

The second option: Functional programing?

(I am a mechanical engineer... don't ask me too much about programming paradigms)

So, a third option that I implemented yesterday—but that currently only support triangles—is as follows:

enum PrimitiveType{
      Sphere,
      Triangle,
      //....
}

struct Primitive{
      class: PrimitiveType,
      data: [f64;12]
}

// And then I need to do things like:
impl Primitive{
      fn intersect(&self, ray: &mut Ray)->{
             match self.class {
                     PrimitiveType::Sphere => sphere_intersect(&self, ray),
                     PrimitiveType::Triangle => sphere_intersect(&self, ray),
                     // ...
             }
      }
}
fn triangle_intersect(p: &Primitive)->Point3D{
     //....
}
fn sphere_intersect(p: &Primitive)->Point3D{
     //....
}

Discussion

So, the Trait option seems to be the most ergonomic, but I am not sure it allows for the level of performance required by a ray-tracer. I say this because of dynamic dispatch and maybe alignment things....?

Then, the second one does not need dynamic dispatch... but did not seem to be very efficient when I tried it some months ago.

Now, the final one seems to allow for very well packed data... but I am not sure why isn't it—internally, by the compiler—handled exactly the same as the Enum option. Is there any expected benefit to this? I saw big improvements yesterday, but this was not the only thing I did.

I would appreciate reading some discussion on this, as I have learned a lot in this project.

Best!

2 Likes

What do you mean by

?

Anyway, this is very usage-specific. Benchmark each option. The I think enum should be the fastest, and the third approach is about the same just less ergonomic.

Hi!

What I meant by the Enum option being the same(ish) as the Functional one is that—ultimately—they both are translated into equally-sized data blocks that have a discriminant and a set of numbers... am I wrong?

Yeah, I know I can test them... but I want to learn what is going on under the hood! Knowing how fast these are is just the tip of the iceberg

Best!

I was referring to the part you said it isn't handled the same way, because it indeed should (more or less).

It's very, very hard to predict optimizations. You can try to inspect the assembly, but even that may not teach you much. If you want to know how rustc handles this, check the LLVM it emits. If you want to know how LLVM may optimize this... Well, I can only wish you good luck. Try to look for LLVM materials (the best reference is the source code. It's well written but may be a lot to read and understand).

The enum_dispatch crate is an option to help reduce the boilerplate required for using an enum of objects which all implement a trait.

One other option is to make your container with Vecs of each possible type:

struct PrimitiveObjects {
    spheres: Vec<Sphere>,
    triangles: Vec<Triangle>,
    // and on...
}

You could then go a bit more functional:

fn intersect_type<P: Primitive>(objects: Vec<P>, ray: &mut Ray) { /* ... */ }

Then you just have to call the same function on each contained Vec of objects.

For adding primitive objects to the container you could pass them as an enum:

fn add_primitive(&mut self, object: PrimitiveType) {
    match object {
        PrimitiveType::Sphere(sphere) => self.spheres.push(sphere),
        PrimitiveType::Triangle(triangle) => self.triangles.push(triangle),
        // and on...
    }
}

Thanks for your response, but—as I explain in another topic—I am not sure how to use separate arrays for different types while maintaining the efficiency of the scene transversal.

I ended up transforming everything into a type Triangle = [f64;9] and working functionally.

That enum_dispatch_crate looks very useful. Thanks for sharing!

Ahh, I should have looked up what BVH actually does... You might be able to work it by creating a separate BVH indexing collection holding shape selection enums which contain just the needed index:

enum ShapeVecIdx {
    Sphere(usize),
    Triangle(usize),
    // etc...
}

// It sounds like wider tree structures are helpful for BVH, so this might
// be a point to consider decoupling your BVH tree from the primitive struct
type PrimitivesBVH = Custom16AryTree<ShapeVecIdx>;

// Then application of rays can use the BVH tree order to improve efficiency
impl Primitives {
    fn apply_ray(&self, bvh: PrimitivesBVH, ray: &mut Ray) -> Point3D {
        for index in bvh.iter() {
            match index {
                Sphere(idx) => // run sphere_intersect on self.spheres[idx]
                // and return if point found, etc...
            }
        }
    }
}

Though at this point you're back having to match an enum for every index operation to select and apply a ray to an object. But, if your set of objects does not need to be mutated once rendering starts, you can pass as many references of your base primitives as you want out to separate threads for creating BVH trees.

But, really, if your various object types are all similarly sized (i.e. the biggest primitive isn't >10x the size of the smallest), then you may not see any benefits from going to the struct-of-arrays layout. Though rendering scenes tend to have a lot of objects, so efficient memory mapping of your base object store may be worthwhile...

1 Like

The enum version allows you to have a Vec<Primitive>, while the dynamic traits version does not. There you can either have a Vec<Box<dyn Primitive>> or a Vec<&dyn Primitive>. The Box variant makes each primitive a separate allocation, which will probably be detrimental to performance.

The reference version can be combined with the actually storing the primitives themself in a PrimitiveObjects kind of thing as suggested by @PFaas . This will give good "packing" and almost as good memory locality as the enum version, but then you have to keep track of the lifetimes (the vecs in the PrimitiveObjects can not be modified why the vec of references exist). But if you have a large number of total objects and do some sort of filtering while building the vec of references, this may more efficient than alternatives. Certanly way more efficient than repeatedly copying selected objects to relevant vecs.