Is it possible to safely build a read-only, thread-safe, bidirectional tree?

For thread safe, bidirectional trees, the standard solution is to use Arc<Mutex<T>> (and the corresponding Weak<Mutex<T>>) for the links.

I was working on a project where the tree, once built, is readonly; after construction, not having the intermediate Mutex considerably speeds up the access (by removing contention).

Currently, in order to achieve this, I use the nightly, unsafe, Arc::get_mut_unchecked() API; the code is (roughly) the following:

    pub fn new(mut children: Vec<Arc<Self>>) -> Arc<Self> {
        let mut parent = Arc::new(Self {
            parent: Weak::<Self>::new(),
            children: vec![],
        });

        for child in children.iter_mut() {
            let child_parent_ref = &mut unsafe { Arc::get_mut_unchecked(child) }.parent;
            *child_parent_ref = Arc::downgrade(&parent);
        }

        let parent_mut = unsafe { Arc::get_mut_unchecked(&mut parent) };
        parent_mut.children = children;

        parent
    }

Playground: Rust Playground

Is there a different (preferrably safer) strategy I can apply, in order to achieve this design?

Note: using Arc::get_mut_unchecked is only safe if there are no other references to the pointee. So your new function should be marked unsafe.

Another approach for threadsafe bidirectional trees is to use indicies instead of Arc

use slab::Slab;
struct Tree {
    nodes: Slab<Node>,
    root: usize,
}

struct Node {
    parent: usize,
    children: Vec<usize>,
}

but this does require a little more work to keep consistent.

edit: reworded concerns

2 Likes

Since you’re using unstable API already, there’s the new_cyclic method. Perhaps you can make that one work for you, e.g.:

#![feature(arc_new_cyclic)]

use std::sync::{Arc, Weak};

#[derive(Debug)]
pub struct Node {
    pub parent: Weak<Self>,
    pub children: Vec<Arc<Self>>,
}

pub struct NodeBuilder {
    build: Box<dyn FnOnce(Weak<Node>) -> Arc<Node>>,
}

impl Node {
    fn new(child_builders: Vec<NodeBuilder>) -> NodeBuilder {
        NodeBuilder {
            build: Box::new(|parent| {
                Arc::new_cyclic(|this| Node {
                    parent,
                    children: child_builders
                        .into_iter()
                        .map(|child_builder| (child_builder.build)(this.clone()))
                        .collect(),
                })
            }),
        }
    }
}
impl NodeBuilder {
    fn build(self) -> Arc<Node> {
        (self.build)(Weak::new())
    }
}

fn main() {
    let tree = Node::new(vec![
        Node::new(vec![Node::new(vec![]), Node::new(vec![])]),
        Node::new(vec![]),
        Node::new(vec![Node::new(vec![])]),
    ])
    .build();
    println!("{:?}", tree);
    assert_eq!(
        Arc::as_ptr(&tree),
        Arc::as_ptr(
            &tree.children[0].children[1]
                .parent
                .upgrade()
                .unwrap()
                .parent
                .upgrade()
                .unwrap()
        )
    );
}

(playground)

3 Likes

He owns children, the problem is if any element of children is beign accessed from another thread.

The solution you proposed may work, however you won't be able to directly share Nodes between threads, you'll need something that wraps an (Arc<Tree>, usize) (if you're ok with bound checks at every access) or something like owning_ref to share Nodes between threads. Also since it is immutable I don't see a point in using a Slab

1 Like

Heh, this is exactly the situation with rust-analyzer's syntax trees. My gut feeling would be:

  • if you want to keep it simple, go with arena and indices, like @RustyYato suggested
  • if this needs to be a vocabulary type with great ergonomics, bite the bullet and implement it using unsafe and raw pointers. rowan crate might be an inspiration here, but the code is rather complicated.
1 Like

I'm wondering if maybe it's also possible to implement an ergonomic wrapper on top of the arena-and-indices approach, e.g. by defining a Node(usize) that Derefs to &arena[index].

You need access to the arena, that's why I previously said you could wrap both an Arc<Tree> and an usize

Arc can be cloned and accessed from another thread (or even the current thread), which means you could be aliasing the unique reference you get from Arc::get_mut_unchecked

1 Like

To clarify, in this specific project, the constructor (new()) is invoked serially (multiple times), but never concurrently; only the resulting tree, after it's been built, is accessed, concurrently and read-only.

Ok, that should be safe. As long as you don't overlap borrows with Arc::get_mut_unchecked you should be fine. I would still mark new unsafe to make to make sure that this invariant doesn't get forgotten :slight_smile: .

1 Like

Thanks everybody!

Besides the glaring omission of the unsafe in the function signature (ouch!), I have a couple of considerations.

First, if new_cyclic allows modeling the problem without unsafe code (it seems so), then, even at the (potential) cost of decreased ergonomy, I think it's a better solution than the one I've implemented.

Second, I've thought about the Slab approach. At first, I didn't connect the dots, but then, I've realized that it's possibly a brilliant solution (if I understand correctly) :star_struck::exploding_head:

The idea is that, by passing the Slab allocator around, the safety problem is solved, because the separation of building (writing) and reading phases, matches the BCK rule (all hail the BCK) of separating the scopes where mutable and immutable borrows happen.

The difference between Slab and reference counting is essentially that the latter doesn't give write control at all, while Slab does (at the condition of obeying to the BCK rules).

I need to think about the ergonomy, but I have the hunch that a clean, ergonomic design may be possible).

This is the direct translation of my original code: Rust Playground

I suppose is essentially @RustyYato's design (it's actually a good idea to have a separate type owning the root and the allocator).

I wouldn't be surprised if this is actually a common pattern in some SWE fields (e.g. game programming) :grimacing:

Not my design, this is a common idiom for removing Rc/Arc in general, I first saw this in petgraph. If you never remove items from the Slab you can replace it with a Vec, and use usize indicies to access your data. If you are removing items, and want some assurances that your indicies aren't going to be used on stale data (similar to dangling pointers), then you can use generational-arena

I haven't tried the same with Slab or gone very far with Arc<RefCell>> type designs, but you might find the below implementation interesting just to consider tradeoffs. Summary:

  • Uses one Vec<Node> in a Document for an entire tree, where NodeData can represent a vacant slot variant (Hole).

  • Memory isn't freed on node removal, but bulk and reasonably efficient Document::compact is supported.

  • Node has prev/next sibling refs and first/last child refs via Option<NonZeroU32> indexes, for space efficiency. (Document keeps an unused Hole at index zero just to support this 4-byte index type).

https://github.com/dekellum/marked/blob/master/marked/src/dom.rs#L64

It seems that this is, besides the node structure (the requirements are different in my case) in essence the slab approach, with the nodes being allocated in an indexed/contiguous memory area.

It's a Vec<Node> with child/sibling double linked lists. My impression was that the Slab, unlike the Vec, keeps track of vacancies in order to fill them on later insert? This doesn't. Of course if in your case, the tree becomes read-only at a certain point, you don't need to fill vacancies either.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.