Bidrectional transitive closure, lots of back references

Ref: Rust Playground

This is a bidrectional transitive closure algorithm. What it's doing is computing whether you can get there from here for a big grid of rectangular regions. If two rectangles share some edge space, they're reachable from each other. The algorithm computes collections of VisGroup, regions which are all reachable from each other.

The trick is doing this entirely in safe Rust. It uses RefCell, borrow() calls, and Weak pointers for its non-owning back pointers. As is typical with Rust, it was really hard to get the borrow plumbing right, and then it just worked.

So was this the right way to organize this algorithm? Comments?

If you run this with cargo test -- --nocapture you can see the output. Playground doesn't seem to offer that option, unfortunately.

(Use case: this is for a support tool for my metaverse client for Second Life/Open Simulator. It is a rule of the grid that you can only see regions you can reach. Some areas are full of checkerboards of isolated private regions, none of which can see each other. Others are big land areas, sometimes connected only by narrow strings of regions. I'm working on an impostor system for long-distance visibility, and I need to do some pre-computation to decide which impostors are visible from where.)

Using Rc<RefCell<T>> and Weak<RefCell<T>> to represent a graph structure is generally a bad idea and is non-idiomatic. Pervasive borrow checker problems are symptoms of such non-idiomatic code.

Instead, use node IDs to describe edges in your graph.

reading spatial algorithm code without data visualization really hurts my brain.

all these intertwined links are really messy, I would rather write the same algorithm using "struct of arrays" style data structure, e.g store all the regions, blocks, and visual groups in separate arrays, and use indices instead of direct pointers to link them together. with integer indices, interior mutability also is much simpler: Cell<usize> is so much easier to use than Rc<RefCell<Node>>.

The data structure handling "bidirectional transitive closure" is called a "disjoint set union" instead, so you can use that. It will have cleaner code as well as better time complexity.

Node IDs are essentially raw pointers. Dangling pointers, indices to a dead item, are possible. Plus you have to allocate and release indices, so you have to write an allocator. The two occasions where I've had to use a debugger on Rust code both involved botched index allocation systems inside graphics crates.

Run-time borrows are guaranteed disjoint if they are disjoint in type, disjoint in scope, or disjoint in instance. All borrows here meet one of those criteria. The trickiest part is at line 104:

        if !Rc::ptr_eq(&self.viz_group, &other.borrow().viz_group) {
            println!(
                "Blocks with different viz groups touch: {} and {}",
                self.region_data,
                other.borrow().region_data
            ); // ***TEMP***
            //  Merge the VizGroup sets.
            self.viz_group
                .borrow_mut()
                .merge(&mut other.borrow().viz_group.borrow_mut());
    ...

Here, ptr_eq is used to test if self and other share the same viz group. If they do, they don't need to be merged, because they already are merged. Trying to merge them would result in a double borrow. So an explicit check for disjoint in instance was required.

All the other borrows are easy to check. Most are very local and not inside the scope of other borrows of the same type.

I agree, but I think it's a good match for graph problems. It's best to separate the ownership hierarchy from the graph topology. You create a lot of unnecessary complexity by using the graph topology as the ownership hierarchy where it could be much simpler: have the graph own all the vertices, rather than having the vertices "own" each other.

You don't really have to reuse IDs, you can use consecutive numbers as IDs. If releasing memory during the computation is important to you, you can still use consecutive IDs and a HashMap from IDs to nodes.

It's not all about releasing memory. Notice how Drop is used. When no LiveBlock points to a VizGroup, that VizGroup is complete, and can be reported as a result. Without Rc, I'd need to build another level of tracking data structures.

So you're using Rc ownership reference counters as in-degree counters in your graph algorithm, and are threading the logic of the algorithm through destructors. That's convoluted. It's not really what these counters are for. It's also fragile. Imagine you add some other owner that keeps track of vertices for some other reason, and then your whole algorithm would stop working.

Your own in-degree counters wouldn't be another layer on top of Rc counters, they would function instead of Rc. You wouldn't need Rc at all.

If you're fine with not releasing memory until the end of your code, a good option for trees and graphs is to use arenas or similar:
e.g. Compiler Explorer