N00b question - what is the Rust'ish way of programming serious event-driven programs?

There are two separate, but indeed related questions here:

  • How to write event-driven programs in Rust?

  • How to deal with back pointers in Rust?

I will try to give a comprehensive answers to both of them.

Let's start with back pointers, which is a more fundamental question.
It is indeed the case that in Rust back pointers and cyclic data
structures are "hard" to implement.
"Hard" here means roughly "traditional approaches lead to borrow checker errors".

Philosophical aside

My personal observation is that cyclic data structures are hard in any language.
You need mutation to implement them (unless you use a purely FP language, where mutations are hidden behind the tying the knot idiom),
and you need to deal with partially constructed invalid state as well.
One of my worst bugs was caused by accidental complexity of cyclic data structures, in Kotlin.
Rust makes you aware of the hard problems from the start: you get compile time error instead of segfaults/data races.
So, you immediately notice when you try to write a cyclic data structure in Rust :slight_smile:

Interior pointers

A problem closely related to back pointers is interior pointers.
In Rust, it's impossible (more on that later) to have structures like this:

struct S {
    a: i32,
    a_ref: &'field a
}

That is, a struct which has a field which stores a reference to
another field. There's a deep reason why Rust does not allow this:

  • In Rust, move is always a memcpy.
  • Everything can be moved.

If you try to memcpy S, the value of a_ref won't change, and it
will dangle.

Backpointers present a similar problem: if you move the root, you have
to adjust every child.

Practical solutions

That said, of course there are solutions to back pointer problem. I
hope to list all principally different approaches here.

Lifetimes

It may be surprising, but you can use Rust lifetimes to express
cyclic data structures
. I think I've learned this from this
post
. Before writing
down the solution, let's derive some aspects of it from the first
principles:

  • Due to cycles, we will have several access paths to the same node.
    That means that we'll have to use shared references.

  • To create a cycle, we will need mutability somewhere.

  • From the previous two points, it follows that we'll need interior
    mutability. Specifically, we'll need to store parent links in
    Cells, initialize them to None and set to Some later.

The result looks like this playground:

use std::cell::Cell;

#[derive(Debug)]
struct Parent<'a> {
    children: Vec<Child<'a>>,
}

#[derive(Debug)]
struct Child<'a> {
    parent: Cell<Option<&'a Parent<'a>>>,
}

fn main() {
    let parent = Parent {
        children: vec![
            Child { parent: Cell::new(None), },
            Child { parent: Cell::new(None), },
        ]
    };
    for child in parent.children.iter() {
        child.parent.set(Some(&parent));
    }

    // Will die with stack overflow when printing an infinite tree.
    println!("{:?}", parent);
}

Philosophical aside:

Note how Child.parent is an option. Naively, one would think that
this shouldn't be an option, because a child always has a parent.
However, there's a brief period during construction when the child
doesn't know its parent, and Rust will not allow us to gloss over this
detail. In a language with constructors, like Kotlin, and, I think,
C++, you can cheat here by leaking a reference to a partially
initialized parent to children from parent's constructor.

Boxing

The solution with lifetimes works, but, as any solution with
lifetimes, will poison every signature with a lifetime parameter, and
will force allocating and freeing memory on the caller. That is, you
won't be able to write something like

fn make_parent<'a>() -> Parent<'a> {
    // Can't conjure an `'a` out of fresh air, it must be provided by the caller. 
}

A usual trick to get rid of lifetimes is to heap allocate something,
and it works in this case as well. If we heap-allocate the parent, it
should never move, so we wouldn't have to worry about invalidating
back links. I know two ways to express this pattern safely.

First, we can use Rc/Weak (and RefCell for interior mutability) playground:

use std::cell::RefCell;
use std::rc::{Rc, Weak};

#[derive(Debug)]
struct Parent {
    children: Vec<Child>,
}

#[derive(Debug)]
struct Child {
    parent: RefCell<Weak<Parent>>,
}

fn main() {
    let parent = Rc::new(Parent {
        children: vec![
            Child { parent: RefCell::new(Weak::new()), },
            Child { parent: RefCell::new(Weak::new()), },
        ]
    });
    for child in parent.children.iter() {
        *child.parent.borrow_mut() = Rc::downgrade(&parent);
    }
}

Note that we kind of need Rc here: when we will destroy the tree, we
can destroy the parent first, so the parent pointer of children will
dangle at least during destruction. We can get rid of Rc if we make
sure to never deallocate the parent by leaking it playground:

use std::cell::Cell;

#[derive(Debug)]
struct Parent {
    children: Vec<Child>,
}

#[derive(Debug)]
struct Child {
    parent: Cell<Option<&'static Parent>>,
}

fn main() {
    let parent = Parent {
        children: vec![
            Child { parent: Cell::new(None), },
            Child { parent: Cell::new(None), },
        ]
    };
    let parent = Box::leak(Box::new(parent));
    for child in parent.children.iter() {
        child.parent.set(Some(parent))
    }
}

Finally, because we can encapsulate the tree structure, such that we
guarantee that the root is always boxed, that we don't access parent
in Drop and that if we have a reference to a child then the parent
is alive, we can replace Rc/Weak with Box/*const Parent and use
unsafe for implementation. Here's an example of such encapsulation
for a toy red-black tree:

Disclaimer: no guarantees that that unsafe code is UB-free, it was a
toy project.

Indexes

Finally, there's a third solution, which I think is the best overall.
That is, this is my go-to solution for dealing with cycles, and it
works every time. The idea is to avoid creating literal cycles, store
the nodes in array and use indices to refer to nodes. Folks in Rust
community tend to name this ECS, though I personally think that this is
a miss application of the term (explanation).

This pattern requires some amount of boilerplate to setup (there are
various crate to simplify it), but the end API could be pretty
ergonomic playground:

// This is what we store internally
#[derive(Debug)]
struct Tree {
    parents: Vec<ParentData>,
    children: Vec<ChildData>
}

#[derive(Debug)]
struct ParentData {
    children: Vec<usize>, // Ideally, define `struct ChildIdx(NonZeroU32)`;
}

#[derive(Debug)]
struct ChildData {
    parent: usize,
}

// This is public API

#[derive(Debug, Clone, Copy)] // Copy is awesome for ergonomics
struct Parent<'a> {
    tree: &'a Tree,
    idx: usize,
}

impl<'a> Parent<'a> {
    pub fn children(self) -> impl Iterator<Item=Child<'a>> + 'a {
        self.data().children.iter().map(move |&idx| Child { tree: &self.tree, idx })
    }

    fn data(self) -> &'a ParentData {
        &self.tree.parents[self.idx]
    }
}

#[derive(Debug, Clone, Copy)]
struct Child<'a> {
    tree: &'a Tree,
    idx: usize
}

impl<'a> Child<'a> {
    fn data(self) -> &'a ChildData {
        &self.tree.children[self.idx]
    }
}



fn main() {
    let tree = Tree {
        parents: vec![ParentData { children: vec![0, 1] }],
        children: vec![
            ChildData { parent: 0, },
            ChildData { parent: 0, },
        ]
    };
    let parent = Parent {
        tree: &tree,
        idx: 0,
    };
    eprintln!("parent = {:?}", parent);
}

Note that with this solution you don't need interior mutability. If
you need to mutate the tree, you can take &mut tree as an argument.
This make this pattern especially valuable in multi threaded
scenarios.

State Machines

The text up to this point should answer the parent pointers question,
so let's circle back to async programming. It happens that async
programming is the biggest source of "interior pointers" problem.

Consider this function (written in pseudo syntax, there's no yield in Rust yet):

fn s() {
    let a = 92;
    let a_ref = &a;
    yield;
    println!("{}", a_ref);
}

Compiler should turn this function into a state machine and, if we
write the state by hand, we get a familiar struct:

struct S {
    a: i32,
    a_ref: &'field a
}

That is, the state should contain a and a_ref, and a_ref should
be a reference to a.

That is, to support asyc functions which can borrow across yields,
Rust has to support state machines with interior references in the
state.

The solution to this problem is being developed at the moment (that
is, it is not stable yet) under the name of "pining API".

The idea is to encode additional invariants, necessary to make usage
of state-machines safe, in the standard library type.

Specifically, a new "smart pointer" type, Pin, is introduced. It is
typically used as Pin<&mut T>, and it has semantics of

  • a unique reference to T,
  • which you can't move out from,
  • and which guarantees that T won't be moved until it is destroyed.

Using Pin, a safe API for S can be expressed:

struct S {
    a: i32,
    a_ref: *const i32, // we have to use a raw pointer here
}

impl S {
    // This is a safe function
    pub fn a_ref<'a>(self: Pin<&'a mut S>) -> &'a mut i32 {
        // the impl has to be unsafe.
        unsafe {
            // This is how it *roughly* should look like, 
            // I don't know details of the pining API.
            &mut *self.a_ref
        }
    }
}

That is, using state machines becomes safe, but defining them is still
unsafe. Luckily, with async/await syntax, it's the compiler who
defines the state machines, so it can make sure that all necessary
invariants are indeed preserved.

I guess that's all? :slight_smile:

12 Likes