Possible to implement recursive data structures in safe Rust?

In my learning experience I can not figure out how to implement basic data structures like Single Linked Lists, Binary Trees or Directed Graphs in safe Rust. Especially for element removal operations I have had to resort to unsafe Rust. Either directly with unsafe block or indirectly with standard library auxiliary functions like mem::take or mem::replace. All due to borrow checker prevented have two mutable accesses to the same instance.

I'm curious if there is any implementation of some of above DS which does not require unsafe Rust?

1 Like

mem::take is not an unsafe function, though it uses unsafe code internally.
If you want no unsafe code whatsoever, I think that there's very little you can do with Rust.
Even basic structures, such as Vec use unsafe code internally, as does any allocator or libc entry point.

5 Likes

Please read Learn Rust With Entirely Too Many Linked Lists. You should find answers to all your questions there.

5 Likes

You couldn't do anything, strictly speaking. One would need an entirely different OS ABI and/or different hardware to ensure that no-unsafe-Rust thingie may exist.

It's the same thing with all “safe” languages that exist – only most “safe” languages have that unsafe part written in C while Rust have special unsafe subset in the same language.

2 Likes

I must admit I'm rather confused how you would even use unsafe code to implement these, short of using raw pointers everywhere.

As you mention std::mem::{take,replace} internally use unsafe, but as others have mentioned, so does most code eventually. That's kind of the point of Rust, to wrap up the unsafe code into safe code. It is true that std::mem is a bit too general, but e.g. Option wraps those in member functions, e.g. let next = self.next.take(); which are more idiomatic.

Single linked lists and trees are relatively straightforward, when you use the tools Rust gives you. Here's a very simple List that gives you an idea:

Trees are basically the same or even simpler if you use children: Vec<Tree<T>>.

Graphs are more complex if you want mutation, the real answer here is use a library like petgraph but for learning purposes you have a couple of realistic options: indices or ref-counted with internal mutability.

Indices is the more "rusty" way of doing it, to boil it down you basically replace references with an index into a big list of nodes:

Ref-counting is messier and more work but you might find it more natural:

4 Likes

Indirectly unsafe code is going to be used eventually in Python, Ruby, Lua, JS, and every other memory-safe language.

In all of them the safety is an abstract interface for the programmer, but lower-level code is necessary to talk to the hardware and the operating system.

Even when you just write fn main() {}, Rust like all other programming languages, already runs "unsafe" startup code required by the operating system to initialize the program.

TBF this easily leaks data due to cycles. I would say indices is the only sane way to do graphs without a GC.

1 Like

this is a weird definition for unsafe rust.

as many already said, there's no such thing as "purely safe" rust, after all, all code must be lowerered down to llvm, which doesn't have the rich type system as rust, thus you cannot have the same safety properties. for example, most of the intrinsics in core are unsafe, and these intrinsics are major building blocks of the language.

at some point, all rust code compiles down to a smaller set of primitives, and most of such primitives are unsafe, so you can't determine a function is safe or unsafe by it's internal implementation, you have to draw a line somewhere, and for all intents and purposes, the standard library (particularly, the core crate) IS this line. if an API is not marked unsafe, such as core::mem::replace, then it can be used in safe rust, it doesn't matter what's the internal implementation.

as a counter example, do you consider the dereference operator * (exclusing raw pointers, which are defined to be unsafe) safe or unsafe? it should be safe right? it's an operator, how can that be unsafe?

but, operators are syntax sugars of some lang-item traits, for example, deferencing a Vec (directly or indirectly, e.g. by calling the .iter() method, which is defined for slice types) invokes library functions in the standard library. by your logic, impl Deref for Vec {} uses an "auxiliary" unsafe function, so it must be unsafe!

on the other hand, if we agree operators are safe (except defererencing raw pointers), in theory, we could just introduce new syntax, for example, we could add an operator for core::mem::swap() or core::mem::replace(), so instead of calling swap(&mut x, &mut y), we can write something like x <-> y, but at the end of the day, they must still be lowered to the same IR, and generating the same machine code. so does the surface syntax really matter?

to me, unsafe is more about semantics. as long you use only safe API, it is safe rust. for example, transmute() is unsafe, ptr::copy_nonoverlapping() is unsafe, but mem::take() is NOT unsafe.

don't get me wrong, I'm not saying every data structure can be implemented completely in safe rust: they CANNOT (EDIT: not as cleanly as one may want, after all we have shared ownership, weak pointers and interior mutability anyway, but they are clunky).

I'm just saying, your reasoning about safety regarding (potential unsafe) internal implementations of otherwise totally safe APIs is a bit off.

EDIT:

I want to asnwer the title question in short: as long as you agree owning pointer types like Box and/or allocating containers like Vec are safe rust, then many recursive data structures can be implement in rust without problem. [1]


  1. but it also depends on the API set you want to support, for instance, there's no problem to implement a safe singly linked list, unless you want to support inserting data at head and at tail, both in constant time, and you don't want Rc and Weak, then you probably need Box and raw pointers ↩︎

3 Likes

Recursive structures in Rust are not a problem, and there is no need for unsafe.

You cannot use Option (that would result "infinite size error") but you can use Vector no problem:

#[derive(Debug)]
struct TreeNode {
    value: i32,
    children: Vec<TreeNode>
}
fn main() {
    let leaf = TreeNode {
        value: 3,
        children: vec![],
    };
    let root = TreeNode {
        value: 1,
        children: vec![
            TreeNode {
                value: 2,
                children: vec![],
            },
            leaf,
        ],
    };
    println!("{:#?}", root);
}

Even if it would only be at most one element, this still works. Or, you can also use Box together with Option:

#[derive(Debug)]
struct TreeNode {
    value: i32,
    child: Box<Option<TreeNode>>,
}

fn main() {
    let root = TreeNode {
        value: 1,
        child: Box::new(Some(TreeNode {
            value: 2,
            child: Box::new(None),
        })),
    };
    println!("{:#?}", root);
}

Variant enums should also work.

Hence trees and single linked lists are actually easy.

If you need to create circular references (such as in doubly linked lists), you can do this by using strong references (Rc) in one direction and weak references (Weak) in the other. Using strong references in both directions would create a memory leak. Rust also provides an elegant method, Rc::new_cyclic, which allows you to access the not-yet-fully-constructed object while building it. There is no need to use RefCell as I have seen some are doing.

use std::rc::{Rc, Weak};

#[derive(Debug)]
struct TreeNode {
    pub value: i32,
    parent: Weak<TreeNode>,      // weak reference to parent
    child: Option<Rc<TreeNode>>, // strong reference to child
}

fn main() {
    let parent = Rc::new_cyclic(|emerging_parent: &Weak<TreeNode>| TreeNode {
        value: 1,
        parent: Weak::new(),
        child: Some(Rc::new(TreeNode {
            value: 2,
            parent: emerging_parent.clone(), // child points back to parent
            child: None,
        })),
    });

    println!("Parent value: {:#?}", parent.value);
    // Access child's value through parent
    if let Some(c) = parent.child.as_ref() {
        println!("Child value: {}", c.value);
    }
}
1 Like

I mention specifically that

as the OP mentioned

I initially assumed that removal is not a problem as long as your graph is mutable. In my previous example

let mut parent = Rc::new_cyclic(| ....

// Child removed:
Rc::get_mut(&mut parent).unwrap().child = None;

However, I overlooked that Rc::get_mut also requires all weak references to be discarded before it can succeed, so this idea does not work as needed; we cannot deconstruct simply by overwriting the only strong Rc we have (others being Weak). With mutable structures, RefCell is likely the best choice, or if the graph is not large, it may be newly constructed.

That only works for the roots of directed graphs though. The moment you try to mutate a shared Rc that unwrap fails.

I agree. Rc::get_mut may be very difficult to get as it requires reducing all strong references to one (the one passed as an argument) and all weak references to zero; otherwise, None is returned.

Singly-linked lists:

struct List<T>(Option<Box<ListNode<T>>>);

struct ListNode<T> {
    head: T,
    tail: List<T>,
}

Binary trees:

struct Tree<T>(Option<Box<TreeNode<T>>>);

struct TreeNode<T> {
    value: T,
    left: Tree<T>,
    right: Tree<T>,
}

Element removal should be pretty easy for both of these.

Directed graphs are not so easy to do in safe Rust, though! As others have said, cycles are the problem.

1 Like

All these functions are safe. The fact they use unsafeunder the hood doesn't make them unsafe much like we do not call shell script a system language even though it could be boiled down to system calls.

Recursive structures might be implemented pretty easily, unless you need a reference to self (child with link to a parent, doubly linked list and so on).
In latter case you have several options:

  1. Create all nodes in advance and create a read-only structure.
  2. Use unsafe code.
  3. Use interior mutability.

But again: in most cases you do not need it. And unidirectional structures can often be expressed in idiomatic safe Rust.

Thanks bourumir-wyngs and simonbuchan for inspiring answers.

It caused me to review my code and find out it leaks memory as well :wink:

Indices approach to graph may work but seems like a workaround a natural straightforward implementation. A bit less obvious.

I'm aware "unsafe Rust" can be equivalent to normal "safe" code in other languages. I've just spent a long time coding in Haskell, where those problems can be actually implemented in safe code.

I would argue unsafe is a superset of safe rust, not a subset.

Yes to petgraph with an asterix. If it scales to your problem, a generic graph library like petgraph is a great choice. But often in graph problems there are domain specific circumstances you can exploit for data representation and/or algorithms. And as you scale to very large data sets that can matter, sometimes massively so.

Petgraph does support "bring your own representation" via traits though if I recall correctly, but I have never tried that so I don't know how good/flexible it is.

They can be implemented trivially in any language with garbage collection, since it does all the heavy lifting. You didn't need Haskell for that!

The point of Rust is you can get both memory safety without needing a garbage collector. The trade off is that you need to be explicit about ownership, and graphs are the very definition of "unclear ownership". You get around that by saying the graph itself is a type that owns the nodes, and that means you're accessing the nodes via the graph through a layer of indirection, in other words an index. Using indices to break dependencies is a common approach that works shockingly well in practice, often actually improving performance over using pointers! [1]

There's lots of pitfalls if you're on the implementing side of an index based container like this though, since it eventually becomes just a bad memory allocator. If you can't use a library (for real code, not learning), then stick with the simplest set of features you need, to save yourself headaches.


Out of the box they provide a few fairly different common representations, edge lists, matrix, etc., the ones you'd read out of a book, and the traits are pretty simple looking too?

I'd guess if you have a representation so specific you can't support arbitrary graph operations you probably aren't being helped much by petgraph anyway.


  1. The cost of an index vs a pointer is checking for validity with a less than operation (significantly cheaper than ref-counting), then applying an offset to the pointer before dereferencing it (basically free in comparison). In exchange, you get far better cache locality which just completely swamps any other concerns. ↩︎

1 Like

These are not scary functions at all. They're the core of so many important safe patterns in Rust. Use them liberally.


Note that generally recursive structures are fine on their own. The problem you hit is when you try to add things like "parent pointers", because those are fundamentally weird. (Doubly-linked lists also have this parent pointers problem, compared to singly-linked lists.)

See how even in C# (with a GC and no borrow checker) weird things happen in you put an XElement into two other XElements, because the XObject.Parent Property (System.Xml.Linq) | Microsoft Learn can't point to both parents.

Avoid things like that, though, and you can do recursive structures in Rust just fine.

1 Like

Wild - I've implemented trees in GC languages with parent pointers, and I've always enforced the unique parent constraint one way or the other!