Using `BTreeSet::range` to find nearby coordinates

I'm trying to find and delete duplicate vertices in a list. Duplicate in this case means coincident within a certain tolerance like f32::EPSILON or something similarly small. I thought I'd use a BTreeSet with a custom entry type along with the nice range() API to implement a way to efficiently search for coincident points instead of taking the naive loop-in-a-loop approach. However, I can't get the element ordering to work properly. Here's my code so far:

use nalgebra::Point3;
use std::{cmp::Ordering, collections::BTreeSet, ops::Bound};

#[derive(Debug, Clone, Copy)]
struct SortMe {
    pos: Point3<f64>,
}

impl SortMe {
    fn new(x: f64, y: f64) -> Self {
        Self {
            pos: Point3::new(x, y, 0.0),
        }
    }
}

impl Eq for SortMe {}

impl PartialEq for SortMe {
    fn eq(&self, other: &Self) -> bool {
        self.pos == other.pos
    }
}

impl PartialOrd for SortMe {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        Some(self.cmp(&other))
    }
}

impl Ord for SortMe {
    fn cmp(&self, other: &Self) -> Ordering {
        let x = self
            .pos
            .x
            .partial_cmp(&other.pos.x)
            // A failed comparison will fall through to comparing the Y coordinate
            .unwrap_or(Ordering::Equal);

        let y = self
            .pos
            .y
            .partial_cmp(&other.pos.y)
            .unwrap_or(Ordering::Equal);

        x.then(y)
    }
}

And my test case:


fn main() {
    let minus_x = SortMe::new(-1.0, 0.0);
    let plus_x = SortMe::new(1.0, 0.0);
    let minus_y = SortMe::new(0.0, -1.0);
    let plus_y = SortMe::new(0.0, 1.0);

    let top_left = SortMe::new(-1.0, -1.0);
    let top_right = SortMe::new(1.0, -1.0);
    let bot_left = SortMe::new(-1.0, 1.0);
    let bot_right = SortMe::new(1.0, 1.0);

    let center = SortMe::new(0.0, 0.0);

    let set = BTreeSet::from([
        minus_x, plus_x, minus_y, plus_y, top_left, top_right, bot_left, bot_right, center,
    ]);

    let radius = 0.1;

    // Bounding box around `center` point
    let tl = SortMe::new(-radius, -radius);
    let br = SortMe::new(radius, radius);

    let items = set.range((Bound::Included(tl), Bound::Included(br)));

    let matching = items.collect::<Vec<_>>();

    dbg!(&matching);

    // Should be empty, but contains 3 elements!
    assert!(matching.is_empty());

    // Actually contains this:
    // [
    //     SortMe {
    //         pos: [
    //             0.0,
    //             -1.0,
    //             0.0,
    //         ],
    //     },
    //     SortMe {
    //         pos: [
    //             0.0,
    //             0.0,
    //             0.0,
    //         ],
    //     },
    //     SortMe {
    //         pos: [
    //             0.0,
    //             1.0,
    //             0.0,
    //         ],
    //     },
    // ]
}

Rust playground link

In the example code, all the points except center lie outside its bounding box, so no values should be returned for the given range I've created (bounding box with "radius" 0.1). Am I misunderstanding how BTreeMaps work? Why are some of the points matched with a range that doesn't include them?

By the way, this all works perfectly with a single f64 (i.e. single axis). I currently have a function that uses a BTreeSet to find points with close X coordinates, then a naive loop to filter for nearby Y coordinates which works ok but I dunno, it's inelegant. It seems that the combination of two axes makes things start to fail but I feel like it should work.

Your Ord implementation is saying that the x coordinate should be compared first, and only if that fails or is equal then the y coordinate is used. This means that a point with x=0 will always be included between points with x=-0.1 and x=0.1, not matter what their y value is.

Fundamentally, the issue is that you cannot express your bounding box condition with a total order, and BTreeSet requires a total order. You'll likely want to use some other data structure specialized for 2D or 3D, for example a quad-tree.

5 Likes

Ah... there was the obvious thing I was missing. Thanks for pointing that out! Someone recommended a k-d tree to me, specifically kiddo, which seems like a good option. Failing that I'll try a quadtree as you suggest :slight_smile:

f32::EPSILON is not what you want here; it’s too small for most applications. For example, 2.0 + f32::EPSILON == 2.0; for values ≥ 2 it represents no tolerance at all. Even though “epsilon” is commonly used as a name for such tolerances, you should not take f32::EPSILON as being appropriate for that purpose; it is actually the much more specific concept of machine epsilon. It’s a quantity that relates to the possible rounding error in individual operations, which will often be much smaller than the total rounding error in an algorithm, and it also doesn't make sense to use it as an absolute rather than relative error.

You ned to pick a tolerance that suits your actual application.

(And yes, the documentation should be improved on this point.)

4 Likes

FWIW this in a machining context, so the epsilon will actually be something like 0.001 (1 micron) or thereabouts.

There's also rstar (an R* Tree implementation), and its locate_in_envelope (and related) method: RTree in rstar - Rust

if you don't want to use an external library, you can use BTreeMap<OrdF64, BTreeMap<OrdF64, SortMe>> where the keys are the x and y coordinates respectively and OrdF64 is a wrapper over f64 so it implements Ord.