Safe alternative to pointers in a heavy self-referencial data structure

Hi! I'm trying to implement the Algorithm X from Donald Knuth using a data structure known as "Dancing links matrix" in Rust. The structure is basically a sparse matrix where each cell is connected to the 4 adjacent cells above, below, left and right in a circular fashion.

The solving algorithm requires the pointers between cells to be switched (as they were dancing, hence the name) while iterating over columns and then rows, so I need a data structure that uses interior mutability, else I would not be able to mutate the structure while iterating over it (or at least I couldn't find another way to do it). This is safe since the cells being iterated are not the ones modified.

My design for the structure is:

  • a Header struct, holding the header informations for each column;
  • a MatrixCell struct, holding the pointers to the other 4 cells;
  • a pair of indexes holding all the instances of Header and MatrixCell respectively

This is the repo where the code is hosted (sorry if it's not documented, but I would like to have a definitive architecture before doing it), there are some branches where I tried different approaches:

  • in the main branch, there is a safe implementation using usizes as pointers, indexing in the indexes to recover the instances of the cells.This has the advantage of being completely safe, while incurring in the runtime cost of a double read when iterating, which I found slow after some profiling;
  • in the ptr branch, I resorted to having the instances in two boxed slices (stored in the indexes) and having the cells storing *mut T pointers to the memory around. The indexes actually store a UnsafeCell<Box<[T]>>, so that I have the interior mutability for free. This should be safe (correct me if I am wrong) since the heap memory pointed by the slice should not be moved. This is the fastest implementation up to now, but it's heavily clutttered with unsafe, as I have to constantly dereference and update via pointers;
  • in the rc branch, I tried replacing the pointers with Rc<UnsafeCell<T>> to have interior mutability and shared references. This is slower (as expected) due to the clone/drop stuff done by the Rc every time a link is modified.

I didn't even try using Rc<RefCell<T>> since I expect it to be even slower than the last approach.

I don't think there is a completely safe way to implement this structure without incurring in runtime penalties of some kind, but I may be mistaken.

To profile the code, I use samply, building using a custom profile that keeps debug info (called profiling).

Usually I use the provided implementation of the n-queens problem and something like

$ cargo build --profile=profiling
$ samply record target/profiling/nqueens 66  # or 80

Feedbacks and code reviews are of course accepted :smiley:

Assuming that you can preallocate all of the cells, you could try using std::cell::Cell<Option<&’a MatrixCell<‘a>>> as your pointer type, with ’a being the lifetime of the arena that the cells are allocated in. This should have the same memory representation as *mut T.

I haven’t tried it myself, though, so you might find there’s some kind of variance trouble with this approach.

2 Likes

Thanks for your response! I definitely can allocate all the cells upfront, it's the whole point of using an index and then converting it to a slice. I'm not getting your solution, though. What's the need for the option? Pointers in the cells are always defined, they are not only while initializing, but that is a temporary condition. I also cannot understand why you are using shared references, I am supposed to do stuff like cell.right.left = cell, so I have the need to mutate through the Cell content.

I also find it hard to declare 'a, since the index is inside the structure holding the cells, so I would not know how to express that lifetime, since it is tied to the lifetime of the structure itself.

I agree with the suggestion of 2e71828, you should probably try with a type of matrixcell that looks like

struct MatrixCell<'a> {
    up: Cell<Option<&'a MatrixCell<'a>>>,
    …
}

The Option is there because at the beginning, you have no MatrixCell to point to. Maybe there is a way to remove it, I’m not sure.
Mutability is entirely handled by the Cell, so we only need a shared reference.

To make this work lifetime-wise, cells need to be preallocated. You can do this by either knowing the required number of cells, and allocating a vector full of them, e.g. let cells: Vec<Cell<Option<MatrixCell>>> = vec![None ; size]. Or by using an arena, like bumpalo.

Finally, if you want mutable shareable pointers with no overhead, I recommand you take a look at GhostCell. It has some restrictions, but I think it will fit dancing lists very well. In MatrixCell, just replace Cell with GhostCell.

1 Like

the Option is technically not required for the algorithm, but you cannot safely create a MatrixCell value without it, just as you stated, there are no valid pointers to put into them during initialization, even if it is a temporary condition.

in rust type system, each type can hold certain invariant/contract that must not be violated, otherwise UB could happen. reference is one such example. if the type system cannot be satisfied (even temporarily), you either need to write some form of unsafe code, or you tweak the type definition.

by definition, exclusive reference simply doesn't work: the data structure is all interconnected (cell references are shared among all its neighbors); and what's more, exclusive reference is invariant to the referent type, which will cause other problems.

interior mutability is not all about RefCells (which has runtime overheads): since shared references are Copy, Cell can be used to make them mutable.

2 Likes

Shared mutability types also require invariance.

Two lifetimes may avoid some problems.

Borrow checking is about more than preventing dangling pointers, so not necessarily.

At a minimum when writing unsafe code, you should test it using Miri.

Here's the ptr queens in the playground; you can see the Miri results under Tools, top-right.

1 Like

I already did it, and it reported some errors, that's one of the reasons behind my question of how I can avoid unsafe code.

1 Like

The only way to remove it is to drop into unsafe during the construction phase. Due to pointer provenance / stacked borrows rules, though, this is extremely non-trivial to do. I managed to write this version¹. The general idea is that each node gets initialized circularly linking to itself, and then only safe operations are used to manipulate the data structure from then on.

This took a lot of false starts, though, and I'm still not happy with the solution— It feels like it should be possible to use MaybeUninit, but I couldn't get around the issue of creating a real & to a not-yet-fully initialized structure. So I ended up transmuting between different #[repr(C)] structs instead, which is always a last resort.


¹ Not based on @GlennHK's code at all, but instead translated directly from pp. 5-8 of Knuth's original paper.


Edit: Using Option and limiting unsafe to possibly calling unwrap_unchecked() ends up much cleaner IMO.

1 Like

I think I got the idea. What I couldn't manage to get working was having references as pointers since I had no lifetime to tie them on. Passing an arena from the outside fixed all the problems.

Thanks!

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.