Why is there such a huge speed difference between these QuadTree implementations?

Okay, so I've been learning Rust for a few days now and started working on creating a QuadTree implementation. I've made 3 different ones over the course of a few days and the speed differences (measured via Criterion.rs) are immense. I have no clue what the cause of this is.....

This is my first try. I've omitted some helper struct and focus only on the Insertion part now.

Iteration 1
1,000 insertions: 368us; 10,000 insertions 3.7333ms

pub struct QuadTree {
    boundary: Rectangle,
    points: Vec<Point2D>,
    north_east: Option<Box<QuadTree>>,
    south_east: Option<Box<QuadTree>>,
    south_west: Option<Box<QuadTree>>,
    north_west: Option<Box<QuadTree>>
}

impl QuadTree {

    const MAX_CAPACITY: usize = 4;

    pub fn new(p0: Point2D, width: f32, height: f32) -> Self {
        QuadTree {
            boundary: Rectangle {
                p0,
                width,
                height
            },
            points: Vec::new(),
            north_east: None,
            south_east: None,
            south_west: None,
            north_west: None
        }
    }

    pub fn insert(&mut self, point: Point2D) -> QuadTreeResult {
        if !self.boundary.contains(&point) {
            return QuadTreeResult::Err
        }

        if self.points.len() < QuadTree::MAX_CAPACITY && self.is_leaf() {
            self.points.push(point);
            return QuadTreeResult::Ok
        }

        if self.points.len() >= QuadTree::MAX_CAPACITY || !self.is_leaf() {
            self.subdivide();

            if self.north_east.as_mut().unwrap().boundary.contains(&point) {
                return self.north_east.as_mut().unwrap().insert(point)
            } else if self.south_east.as_mut().unwrap().boundary.contains(&point) {
                return self.south_east.as_mut().unwrap().insert(point)
            } else if self.south_west.as_mut().unwrap().boundary.contains(&point) {
                return self.south_west.as_mut().unwrap().insert(point)
            } else {
                return self.north_west.as_mut().unwrap().insert(point)
            }
        }
        QuadTreeResult::Err
    }

    fn subdivide(&mut self) -> QuadTreeResult {
        if self.is_leaf() {
            let p0 = &self.boundary.p0;
            let new_width = self.boundary.width / 2.0;
            let new_height = self.boundary.height / 2.0;

            self.north_east = Some(Box::new(QuadTree::new(p0.offset(new_width, 0.0), new_width, new_height)));
            self.south_east = Some(Box::new(QuadTree::new(p0.offset(new_width, new_height), new_width, new_height)));
            self.south_west = Some(Box::new(QuadTree::new(p0.offset(0.0, new_height), new_width, new_height)));
            self.north_west = Some(Box::new(QuadTree::new(p0.offset(0.0, 0.0), new_width, new_height)));

            let old_points = mem::replace(&mut self.points, Vec::new());
            for p in old_points {
                if let QuadTreeResult::Err = self.insert(p) {
                    return QuadTreeResult::Err
                }
            }
        }
        QuadTreeResult::Ok
    }

    fn is_leaf(&self) -> bool {
        return self.north_east.is_none() && self.south_east.is_none() && self.south_west.is_none() && self.north_west.is_none()
    }
}

Okay, so I thought to try enums... just because...

Iteration 2
1,000 insertions: 16.791ms 10,000 insertions 1.755s

pub enum QuadTree {
    Leaf {
        boundary: Rectangle,
        points: Vec<Point2D>
    },
    Root {
        ne: Box<QuadTree>,
        se: Box<QuadTree>,
        sw: Box<QuadTree>,
        nw: Box<QuadTree>
    }
}

impl QuadTree {

    const MAX_CAPACITY: usize = 4;

    pub fn new(boundary: Rectangle) -> Self {
        QuadTree::Leaf {
            boundary,
            points: Vec::new()
        }
    }

    pub fn count(&self) -> usize {
        match self {
            QuadTree::Leaf { boundary: _, points} => {
                return points.len()
            },
            QuadTree::Root { ne, se, sw, nw } => {
                return ne.count() + se.count() + sw.count() + nw.count()
            }
        }
    }

    pub fn insert(&mut self, point: Point2D) -> Result<(), String> {
        match self {
            QuadTree::Leaf { boundary, points} => {
                if !boundary.contains(&point) {
                    return Err(String::from("Boundary doesn't contain point"))
                }
                else if points.len() == QuadTree::MAX_CAPACITY {
                    self.subdivide();
                    return self.insert(point)
                } else {
                    points.push(point);
                    return Ok(())
                }
            },
            QuadTree::Root { ne, se, sw, nw } => {
                if ne.insert(point).is_ok() {
                    return Ok(())
                } else if se.insert(point).is_ok() {
                    return Ok(())
                } else if sw.insert(point).is_ok() {
                    return Ok(())
                } else if nw.insert(point).is_ok() {
                    return Ok(())
                }
                return Err(String::from("Point couldn't be inserted in any sub-tree"))
            }
        }
    }

    fn subdivide(&mut self) {
        match self {
            QuadTree::Leaf { boundary, points} => {
                let new_width = boundary.width / 2.0;
                let new_height = boundary.height / 2.0;

                let mut new = QuadTree::Root {
                    ne: Box::new(QuadTree::new(Rectangle::new(boundary.p0.offset(new_width, 0.0), new_width, new_height))),
                    se: Box::new(QuadTree::new(Rectangle::new(boundary.p0.offset(new_width, new_height), new_width, new_height))),
                    sw: Box::new(QuadTree::new(Rectangle::new(boundary.p0.offset(0.0, new_height), new_width, new_height))),
                    nw: Box::new(QuadTree::new(Rectangle::new(boundary.p0.offset(0.0, 0.0), new_width, new_height)))
                };
                for p in points {
                    new.insert(*p).unwrap();
                }
                mem::replace(self, new);
            }
            _ => {}
        }
    }
}

Speed differences between 1 and 2:
1,000: ~45x
10,000: ~450x

This is an enormous difference and I don't know why that is. I have an idea, and that is maybe that version 2 always allocates new QuadTree's when a new Root is made.

But other than that I have no clue....

3 Likes

Have you tried profiling? If you’re on linux, try running your benchmark/workload under perf - it should give a good idea of where time is spent.

@vitalyd: Woop woop, I fixed it!

It is the first time using a profiler for rust, but I found the culprit. The problem was the:

Result<(), String>

in iteration 2 and how I instantiated it:

return Err(String::from("Boundary doesn't contain point"))

This part got called here:

if ne.insert(point).is_ok() {
    return Ok(())
} else if se.insert(point).is_ok() {
    return Ok(())
} else if sw.insert(point).is_ok() {
    return Ok(())
} else if nw.insert(point).is_ok() {
    return Ok(())
}

So everytime a Root enum inserted a point worst-case it had to allocate 3 times a new string. This was visible in the flame graph of QuadTree::insert because of the fact that 60% of the time was spent on <alloc::string::String as core::convert::From<&str>>::from.

Replacing the error types with:
Result<(),&'static str>

And it is solved! It is comparably fast now.

change:
time:   [-95.218% -95.127% -95.024%] (p = 0.00 < 0.05)
thrpt:  [+1909.8% +1952.0% +1991.1%]
Performance has improved.

EDIT:
After some more profiling I found out that iteration 2 shows exponential performance and 1 has linear performance. I found out that the cause was in the subdivide method where always all sectors were instantiated. By moving to an Option for the sectors and only instantiating them when they needed to be populated I have now similar performance for both iterations. Iteration 2 behaves a little better even.

I made another one where all the structures like Rectangles are reused since they are almost always similar. This is a little slower, but should reduce memory usage. Need to profile that though....

change:
time:   [-97.029% -97.015% -97.002%] (p = 0.00 < 0.05)
thrpt:  [+3235.7% +3250.1% +3265.9%]
Performance has improved.

13 Likes

Extra :100: points for using Criterion.

3 Likes