N00b question - what is the Rust'ish way of programming serious event-driven programs?


#1

Disclaimer: I am coming from C++ side, so pls don’t hit me too hard if I don’t understand something or everything in Rust

Let’s consider the following scenario:

  • we need to write an event-driven program (a.k.a. Actor, a.k.a. Reactor a.k.a. ad-hoc Finite State Machine)
  • our program has its state - which essentially has an ‘infinite’ lifetime (at least from the point of view of all the event handlers)
  • within the state of the event-driven program, we need to have an object PurchaseOrder, containing (owning) several PurchaseOrderItems
  • within each PurchaseOrderItem, we need a (non-owning) pointer back to its PurchaseOrder

The question is: what is the “Rust” way of programming this kind of stuff? The problem I see with it is that it seems that scope/stack-based analysis (NLL included) cannot possibly ensure correctness of the non-owning back pointer :frowning: within heap-object-with-infinite-lifetime. Of course, there is always an option to make FSM state an unsafe class, encapsulating our invariant of “back pointer always points to the owner” and enforcing correctness there - but the question goes as follows: is there a better way of handling this (very wide BTW) class of problems in Rust? Ideally - this kind of app-level things shouldn’t require unsafe code, but I have my doubts about being able to enforce safety of these things in compile-time without some app-level code being unsafe :frowning: .


#2

For event-driven programs (if you want generic emitter/observer interfaces and dynamically add and remove observers) you’ll most likely end up with Arc<Mutex<Whatever>> handling the events, which is equivalent of regular object references in other languages (+ thread safety. Use Rc<RefCell<Whatever>> in single-threaded programs).

These types of programs create a web of references that are beyond what borrow checker can understand, so forget about using bare references in observers.

The Rusty approach is to rethink whether that is the best pattern. Entity Components Systems (ECS) seem to be a fashionable Rust-friendly alternative recently.


#3

So I don’t know about ‘the’ rust way to do things, but you might be interested in how I’ve been handling this problem in FlowBetween. Instead of events with handler functions as is traditional in most places, I’ve wound up using streams everywhere. Rust’s ‘algebraic’ enums make this work really well, and I’m particularly happy with how streaming events from one place to another reduces the amount of interdependencies between the different parts of the codebase - it’s very testable in a way that user interface code usually isn’t, too.

The problem of ownership is taken care of because an event stream is effectively split into two: a sink where the events are sent and a stream where something receives them.

I needed to implement couple of libraries of functions I couldn’t find elsewhere which you might also find useful if you want to pursue a similar approach: flo_stream provides the means to publish a single event sink to multiple consumers. It’s not too hard to implement a custom stream every time but I found I needed a lot of them and that eventually got tedious. flo_binding is a state management framework: it’s essentially a way to change a stream of events into a state or to change a series of state updates into a stream of events. It’s sort-of knockout.js/react for rust.


#4

The Rust way is to search for a crate that does what you want, and if it doesn’t exist, to create it. Chances are, others have already solved your problem in the past, so there’s no need to reinvent a solution to a solved problem.

It sounds like you’re wanting to use the actix actor framework. If this is a web application, the actix-web web framework. If you’re building something from scratch, you can build it around an ECS framework, such as specs or dces.

within each PurchaseOrderItem, we need a (non-owning) pointer back to its PurchaseOrder.

ECS solves this problem without pointers, but with entity IDs, which can be smaller than pointers, and aren’t reliant on memory addresses being valid. An entity with a PurchaseOrder component could store the entity IDs of their parent orders.

You can think of entities as being similar to objects. Components within a world are optional fields that can be assigned to an entity. Unlike objects, entities and their components are stored in heap within arenas, rather than as nodes; with one arena per component and resource (ie: entity resource).


#5

Thanks, it would work! However, Rc<>/Weak<> pair is a run-time thing incurring run-time costs, so my gut feeling of not being able to enforce safety in compile-time happened to be true . So much for zero-cost memory safety in real-world programs…

N.B.: As event-driven programs are at least by default single-threaded - it is going to be Rc<> (not Arc<>).


#6

My question here is not how to dispatch messages/events/… (this is infrastructure-level code which can be made unsafe, I have no problems with it, it is relatively small and rarely changes), but how to handle messages/events/whatever-else after they’re parsed - which is app-level code where ensuring safety is of paramount importance for million-LoC business-level projects. And with app-level code, streams or no streams, at some point I have (a) current State of the event-driven program (~=FSM state), and (b) event. My question is “how to ensure safety of the State if it has back pointers, and the State has to be preserved between different event handlers?”; kornel has suggested that it should be done via Rc<> - and it would work, but will incur runtime costs :frowning: .


#7

Unfortunately, actix doesn’t answer my question (it is infrastructure code which is nice, but my question is about app-level code which has to be written on top of actix). For further clarifications, see my reply to ahunter.


#8

The futures-preview 0.2.2 crate (*) has futures with combinators, channels, and executors which would seem to be the way forward for event-driven processing with Rust. Certainly in a gtk-rs context, I am doing GTK+ based GUI application, it works very well. Also multiple threads can send messages to the event loop with ease, which enables a very flexible approach.

I ended up using Rc<> for non-owning backpointers in code which only the event loop thread will execute. I do not do shared memory multi-threading so haven’t needed Arc<> as yet. I have also managed to do without Mutex<>. The futures, channels, executors infrastructure enables me to send around application level messages with payloads using Rust enums, which are great.

As you point out Rc<> is a runtime type check not a compile time type check, but I haven’t found this to be a problem. But then I am just doing GUI, all the (soft) real time stuff is handled by GStreamer.

(*) futures in crates.io is 0.1.25 which as far as I am aware doesn’t have the features needed by gtk-rs.


#9

Hi Russel, and thanks!

As I wrote, I am more interested in app-level code, so it is Rc<> which I am really interested in (infrastructure code with all the queues, messages, and sync I can always write myself even if it is unsafe :wink: ). BTW, the real reason I am asking about it - is because we’re working on a comparable thing (guaranteed memory safety) for C++ event-driven programs (you’ll probably see it in proposals for ACCU2019 :wink: ), and have run into an observation that ensuring safety for such scenarios is not possible without some kind of run-time support; it is certainly good to know that Rust is not different in this regard :wink: , which makes us more confident that we’re on the right track :slight_smile: .

Best regards,
Sergey a.k.a. ‘No Bugs’


#10

Ah, I seem to have gotten a hold of the wrong end of the stick. I do however, have some advice which I hope is more relevant.

Firstly, the performance overhead of Rc<> is not worth worrying about: it’s effectively just adding a couple of extra arithmetic operations when making new references. There aren’t many situations where an extra arithmetic operation has a noticeable effect on the performance of an algorithm - creating new references to things is seldom close to being the heaviest thing in an inner loop. Accessing the contents of an Rc<> should be the same as accessing a Box, so once everything is set up there’s no further cost in these terms.

The memory overhead may be a different story: Rc is allocated on the heap, and has two counters for references in it, so there’s both the cost for allocating on the heap and an extra 16-byte cost (on a 64-bit system) to pay: for small data structures with large numbers of individual items this can become significant.

Rc<> has what’s perhaps a more serious disadvantage: it’s possible to create reference loops that will result in memory leaks. Even without that it’s sometimes hard to reason about where the Drop call will finally occur.

However: there’s no particular reason why a relationship between two data items should be expressed in terms of memory pointers. For example: it’s possible to use indexes into a Vec instead, which is completely safe, has no lifetime management problems and provides very compact storage - it does have a major downside in that deleting old items is a problem, but I’m going to use it here to illustrate the point that references don’t have to be pointers. Access can be via a Deref implementation - for example:

struct VecRef<'a, T: 'a>(&'a Vec<T>, usize);

impl<'a, T> Deref for VecRef<'a, T> {
    type Target = T;
    #[inline] fn deref(&self) -> &T { &self.0[self.1] }
}

(Performance is going to be a tricky question here: there are a lot of opportunities for the optimiser to do this ‘for free’, and on most modern chips there’s often a chance to do an extra arithmetic op in parallel anyway so the indexing winds up being effectively free in terms of time… then you get into how everything is more compact in memory which makes for more efficient cache usage… Think if it really mattered I personally would just implement all the candidate algorithms and compare them running on real data rather than try to guess at the effects of micro-optimisations)

An interesting way to think about this is in terms of databases: relational databases don’t have pointers at all but instead use keys to define how one item references another. This is the same idea: it’s possible to describe relationships of any kind using pointers, but it’s also possible to describe exactly the same structure using keys instead, which is quite useful when trying to make a safe structure that will pass Rust’s checks (it’s also pretty convenient to have data represented like this if it ever needs to be serialized).

All that said, Rc<T> is there and is generally going to easily be efficient enough and is more convenient, except perhaps when the data needs to be serialized later on.


#11

There are two separate, but indeed related questions here:

  • How to write event-driven programs in Rust?

  • How to deal with back pointers in Rust?

I will try to give a comprehensive answers to both of them.

Let’s start with back pointers, which is a more fundamental question.
It is indeed the case that in Rust back pointers and cyclic data
structures are “hard” to implement.
“Hard” here means roughly “traditional approaches lead to borrow checker errors”.

Philosophical aside

My personal observation is that cyclic data structures are hard in any language.
You need mutation to implement them (unless you use a purely FP language, where mutations are hidden behind the tying the knot idiom),
and you need to deal with partially constructed invalid state as well.
One of my worst bugs was caused by accidental complexity of cyclic data structures, in Kotlin.
Rust makes you aware of the hard problems from the start: you get compile time error instead of segfaults/data races.
So, you immediately notice when you try to write a cyclic data structure in Rust :slight_smile:

Interior pointers

A problem closely related to back pointers is interior pointers.
In Rust, it’s impossible (more on that later) to have structures like this:

struct S {
    a: i32,
    a_ref: &'field a
}

That is, a struct which has a field which stores a reference to
another field. There’s a deep reason why Rust does not allow this:

  • In Rust, move is always a memcpy.
  • Everything can be moved.

If you try to memcpy S, the value of a_ref won’t change, and it
will dangle.

Backpointers present a similar problem: if you move the root, you have
to adjust every child.

Practical solutions

That said, of course there are solutions to back pointer problem. I
hope to list all principally different approaches here.

Lifetimes

It may be surprising, but you can use Rust lifetimes to express
cyclic data structures
. I think I’ve learned this from this
post
. Before writing
down the solution, let’s derive some aspects of it from the first
principles:

  • Due to cycles, we will have several access paths to the same node.
    That means that we’ll have to use shared references.

  • To create a cycle, we will need mutability somewhere.

  • From the previous two points, it follows that we’ll need interior
    mutability. Specifically, we’ll need to store parent links in
    Cells, initialize them to None and set to Some later.

The result looks like this playground:

use std::cell::Cell;

#[derive(Debug)]
struct Parent<'a> {
    children: Vec<Child<'a>>,
}

#[derive(Debug)]
struct Child<'a> {
    parent: Cell<Option<&'a Parent<'a>>>,
}

fn main() {
    let parent = Parent {
        children: vec![
            Child { parent: Cell::new(None), },
            Child { parent: Cell::new(None), },
        ]
    };
    for child in parent.children.iter() {
        child.parent.set(Some(&parent));
    }

    // Will die with stack overflow when printing an infinite tree.
    println!("{:?}", parent);
}

Philosophical aside:

Note how Child.parent is an option. Naively, one would think that
this shouldn’t be an option, because a child always has a parent.
However, there’s a brief period during construction when the child
doesn’t know its parent, and Rust will not allow us to gloss over this
detail. In a language with constructors, like Kotlin, and, I think,
C++, you can cheat here by leaking a reference to a partially
initialized parent to children from parent’s constructor.

Boxing

The solution with lifetimes works, but, as any solution with
lifetimes, will poison every signature with a lifetime parameter, and
will force allocating and freeing memory on the caller. That is, you
won’t be able to write something like

fn make_parent<'a>() -> Parent<'a> {
    // Can't conjure an `'a` out of fresh air, it must be provided by the caller. 
}

A usual trick to get rid of lifetimes is to heap allocate something,
and it works in this case as well. If we heap-allocate the parent, it
should never move, so we wouldn’t have to worry about invalidating
back links. I know two ways to express this pattern safely.

First, we can use Rc/Weak (and RefCell for interior mutability) playground:

use std::cell::RefCell;
use std::rc::{Rc, Weak};

#[derive(Debug)]
struct Parent {
    children: Vec<Child>,
}

#[derive(Debug)]
struct Child {
    parent: RefCell<Weak<Parent>>,
}

fn main() {
    let parent = Rc::new(Parent {
        children: vec![
            Child { parent: RefCell::new(Weak::new()), },
            Child { parent: RefCell::new(Weak::new()), },
        ]
    });
    for child in parent.children.iter() {
        *child.parent.borrow_mut() = Rc::downgrade(&parent);
    }
}

Note that we kind of need Rc here: when we will destroy the tree, we
can destroy the parent first, so the parent pointer of children will
dangle at least during destruction. We can get rid of Rc if we make
sure to never deallocate the parent by leaking it playground:

use std::cell::Cell;

#[derive(Debug)]
struct Parent {
    children: Vec<Child>,
}

#[derive(Debug)]
struct Child {
    parent: Cell<Option<&'static Parent>>,
}

fn main() {
    let parent = Parent {
        children: vec![
            Child { parent: Cell::new(None), },
            Child { parent: Cell::new(None), },
        ]
    };
    let parent = Box::leak(Box::new(parent));
    for child in parent.children.iter() {
        child.parent.set(Some(parent))
    }
}

Finally, because we can encapsulate the tree structure, such that we
guarantee that the root is always boxed, that we don’t access parent
in Drop and that if we have a reference to a child then the parent
is alive, we can replace Rc/Weak with Box/*const Parent and use
unsafe for implementation. Here’s an example of such encapsulation
for a toy red-black tree:

Disclaimer: no guarantees that that unsafe code is UB-free, it was a
toy project.

Indexes

Finally, there’s a third solution, which I think is the best overall.
That is, this is my go-to solution for dealing with cycles, and it
works every time. The idea is to avoid creating literal cycles, store
the nodes in array and use indices to refer to nodes. Folks in Rust
community tend to name this ECS, though I personally think that this is
a miss application of the term (explanation).

This pattern requires some amount of boilerplate to setup (there are
various crate to simplify it), but the end API could be pretty
ergonomic playground:

// This is what we store internally
#[derive(Debug)]
struct Tree {
    parents: Vec<ParentData>,
    children: Vec<ChildData>
}

#[derive(Debug)]
struct ParentData {
    children: Vec<usize>, // Ideally, define `struct ChildIdx(NonZeroU32)`;
}

#[derive(Debug)]
struct ChildData {
    parent: usize,
}

// This is public API

#[derive(Debug, Clone, Copy)] // Copy is awesome for ergonomics
struct Parent<'a> {
    tree: &'a Tree,
    idx: usize,
}

impl<'a> Parent<'a> {
    pub fn children(self) -> impl Iterator<Item=Child<'a>> + 'a {
        self.data().children.iter().map(move |&idx| Child { tree: &self.tree, idx })
    }

    fn data(self) -> &'a ParentData {
        &self.tree.parents[self.idx]
    }
}

#[derive(Debug, Clone, Copy)]
struct Child<'a> {
    tree: &'a Tree,
    idx: usize
}

impl<'a> Child<'a> {
    fn data(self) -> &'a ChildData {
        &self.tree.children[self.idx]
    }
}



fn main() {
    let tree = Tree {
        parents: vec![ParentData { children: vec![0, 1] }],
        children: vec![
            ChildData { parent: 0, },
            ChildData { parent: 0, },
        ]
    };
    let parent = Parent {
        tree: &tree,
        idx: 0,
    };
    eprintln!("parent = {:?}", parent);
}

Note that with this solution you don’t need interior mutability. If
you need to mutate the tree, you can take &mut tree as an argument.
This make this pattern especially valuable in multi threaded
scenarios.

State Machines

The text up to this point should answer the parent pointers question,
so let’s circle back to async programming. It happens that async
programming is the biggest source of “interior pointers” problem.

Consider this function (written in pseudo syntax, there’s no yield in Rust yet):

fn s() {
    let a = 92;
    let a_ref = &a;
    yield;
    println!("{}", a_ref);
}

Compiler should turn this function into a state machine and, if we
write the state by hand, we get a familiar struct:

struct S {
    a: i32,
    a_ref: &'field a
}

That is, the state should contain a and a_ref, and a_ref should
be a reference to a.

That is, to support asyc functions which can borrow across yields,
Rust has to support state machines with interior references in the
state.

The solution to this problem is being developed at the moment (that
is, it is not stable yet) under the name of “pining API”.

The idea is to encode additional invariants, necessary to make usage
of state-machines safe, in the standard library type.

Specifically, a new “smart pointer” type, Pin, is introduced. It is
typically used as Pin<&mut T>, and it has semantics of

  • a unique reference to T,
  • which you can’t move out from,
  • and which guarantees that T won’t be moved until it is destroyed.

Using Pin, a safe API for S can be expressed:

struct S {
    a: i32,
    a_ref: *const i32, // we have to use a raw pointer here
}

impl S {
    // This is a safe function
    pub fn a_ref<'a>(self: Pin<&'a mut S>) -> &'a mut i32 {
        // the impl has to be unsafe.
        unsafe {
            // This is how it *roughly* should look like, 
            // I don't know details of the pining API.
            &mut *self.a_ref
        }
    }
}

That is, using state machines becomes safe, but defining them is still
unsafe. Luckily, with async/await syntax, it’s the compiler who
defines the state machines, so it can make sure that all necessary
invariants are indeed preserved.

I guess that’s all? :slight_smile:


#13

Wow, thanks a lot - but I still want to say how do I feel about it:

  • solutions with unsafe{} in them are not good for app-level code - so I will have to cross them out (as I already wrote - such things are doable, but they require some parts of frequently-changing app-level code to be unsafe -> which is a Really Bad Thing™ maintenance-wise).
  • indexes are obvious but they won’t fly with more than one level of indirection (which is Damn Common) - or we’ll have to store array of indexes instead of one single pointer <ouch! />
  • explicit lifetimes _are_interesting; I still have a concern that having lifetime as a part of type can happen to be too much for the developer; still, we’ll have to investigate this option to see whether it is feasible in practice… Thanks!

#14

Thanks! A few further comments:

  • what you’re saying about Rc<> is mostly correct (though there are also issues of (a) tombstones living longer than original blocks, taking RAM, and (b) extra indirection in tombstones); however, as I wrote to Russel, my question was whether run-time checks is The Best Possible Thing™ out there…

  • in general case with multi-level data structures (which are extremely common in states of ad-hoc FSMs - think, for example, of Scene Graph in an MOG) indexes won’t really fly (heck, I would need to have a whole array of indexes to get to my data member).


#15

Mostly agree with your analysis, except for the point about double indirection.

I think you can avoid it if you store all objects in the same arena. That is, you can write a generic per-request allocator, which allocates objects of any type and gives your back integer handles instead of pointers. Because all types use the same allocator, double indirection does not arise. This basically achieves memory safety by imposing a bounds check on every access.


#17

Assuming that objects are of the same size - or not? The first approach isn’t practical, the second is probably unsafe :frowning: .

EDIT: even more generally, if objects in the arena aren’t of the same type - most likely it would be unsafe :frowning:


#18

This might be unsafe to implement, but should be safe to use?

A trivial implementation would store a ‘Vec’ per every type, and will have a ‘get(&self, idx: usize) -> &T’ method which looks T in the appropriate Vec (this probably requires listing all types upfront).

If we don’t want to list the types upfront, we can store Vec internally, and return indices branded with types to the user. That is, Idx, which a user can’t create, serves as a proof that the specified index indeed contains Foo.


#19

Then we’ll also add an “incarnation id” to this (index,type) tuple (to account for indexes being reused) - and we’ll essentially re-invent a safe pointer :smiley: .


#20

Note that the overhead is still usually smaller than in other languages, because Rust combines refcounting with borrow checking, so you may borrow an existing reference without touching the refcount.

Borrow checker works with scopes, but event-driven programs almost by definition don’t follow scopes. You can’t expect to have ability to do wild ad-hoc use of memory without any limits in time and space, and yet have it statically understood and managed without any bookkeeping. If you can limit ad-hoc nature of event-driven program, e.g. limit event loop to a scope or memory pool, then you get more options for uses of cheaper references.

Another thing is, if your problem requires so much efficiency that refcounting would be a noticeable cost, are events the most efficient ways to model it in the first place? Maybe you should have queues or some more direct computation?


#21

Note that the overhead is still usually smaller than in other languages, because…

I am not going to argue, but have to note that this is what the fans of all the languages are saying (for example, Java guys will say that “reference counting is inefficient, and copying GC is inherently better, because…”)… And in practice it is (unsafe) C/C++ which usually wins :wink:


#22

If you want C+±like unchecked pointers, Rust supports them too. If you want shared_ptr<T>, that’s Rc<Box<T>>.