Bounding points of a point cloud - any good iter form?

John_Nagle · August 16, 2021, 2:52am

Here's a simple problem - find the limits of all the points in a point cloud, efficiently.
The playground above shows the obvious solution, but it is procedural and has mutable variables. Is there some iterator-based solution that isn't slower?

There's a standard iterator function for max or min value of an array, but here I want both max and min for each element of each point, and those standard functions don't help.

quinedot · August 16, 2021, 5:17am

Probably as fast with suitable inlining (on mobile, didn't test)

John_Nagle · August 16, 2021, 5:43am

That's very elegant. I should look at the generated code for that.

It's one of those things which can be very fast if the compiler does all the possible optimizations, and really slow if it doesn't.

2e71828 · August 16, 2021, 7:41am

Some terminology that might help in searching the internet for algorithms:

You're trying to find the axis-aligned minimum bounding box of the point cloud.
A related, but much harder, problem is finding the vertices of the convex hull of the point cloud. Unlike the bounding box corners, these will all be members of the input set.

jjpe · August 16, 2021, 8:36am

Out of interest:

This sounds like a 3D version of the Convex Hull problem. Is it?

newpavlov · August 16, 2021, 8:55am

For maximum efficiency you would have to use SoA for storing your points and probably work with SIMD intrinsics directly. For example, in this code generated assembly processes points one by one, while a more efficient approach would be to process 8 points in parallel (number of f32s fitting into a single YMM register) and you would extract final min/max values from result registers only after all points have been processed.

I am not sure why compiler is unable to do it itself, changing f32 to i32 results in roughly the same loop.

UPD: Here how SIMD-based code may look: Compiler Explorer. It's a bit simplified (e.g. it does not handle tail points) and untested, but should convey the main idea. Note that the main loop (see .LBB0_10) in the generated assembly processes not 8, but 16 points per iteration. It allows CPU to use instruction-based parallelism, since _mm256_min_ps and _mm256_max_ps have latency of 4 cycles, but throughput of 0.5, i.e. you can execute up to 8 of such instructions in parallel. This number depends on CPU, e.g. by using -C target-cpu=native on godbolt the main loop will process 32 points per iteration.

system · November 14, 2021, 8:56am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Max and min of Vec<Vec3> help	4	240	April 14, 2024
Rust iterators optimally help	7	595	September 21, 2022
Improve this code without guessing capacity help	12	574	January 12, 2023
Optimization of Algorithm to Fill Empty Space code review	21	1502	August 7, 2020
Could you advice on how to optimize the code style in rus help	2	236	March 18, 2022

Bounding points of a point cloud - any good iter form?

Related Topics