Graph architecture: RefCells all the way down

Hello, I'm working on the first steps of a generic graph system for personal projects and the RefCell is being both my savior and my nemesis.

Here's a playground link to the issue isolated as much as possible: playground

Essentially, I want to separate the actual storage of nodes from the graph logic. To do so, I need to keep my storage within a RefCell for ease of modification, and the nodes themselves in RefCells as well to avoid concurrent borrow issues with the storage.

Is there any way to make this work? Or is there perhaps another architecture you'd recommend?

Most possible answers to this question are going to include “use fewer RefCells”.

Can you tell us about what write operations you want to perform on this graph — what their signatures should be, what they should do, and from where they will be performed? This will help inform what possible solutions exist. Right now, your code shows only read operations, which means the RefCells aren’t actually doing anything.

depends on how you want to handle your graph, in my experience some of the easiest and most effective ways to do it are

struct GraphNodeTracksEdges{
 HashSet<NodeId> nodes,
 HashMap<NodeId,Vec<Nodeid>> edges,
}
struct GraphGlobalEdges{
 HashSet<NodeId> nodes,
 HashSet<(NodeId,NodeId)> edges,
}

naturally you can replace the HashSets with maps to add data to your node and edges

if you want to have a linked list like graph then yeah you are going to need RefCells

In your example you are keeping nodes in a vec. This is a pretty good representation for the reason that looking up a node by its index is super cheap.

If you need to modify more than one node at a time, you can move out of nodes (i.e. using mem::replace with a default "invalid" node state), then put the nodes back when you're done your modifications.

Instead of keeping an Rc to somehow reuse nodes, you can just use a free list. Keep a generation counter with each node. When a node at index i gets deleted, increment the generation counter. The node handle is simply a pair of index, generation counter - this lets your handle type distinguish between still-active nodes, deleted and now empty nodes, and node indices which got deleted but were then reused (which would have a different generation count than the original node)

Thank you all for the replies.

@kpreid
Can you tell us about what write operations you want to perform on this graph

Essentially: node addition / deletion, node state edition from outside the graph, node connection creation, and finally node running implying going through the graph depth-first along the dependency edges and accessing the node's state from its function every time.
Given the advice provided, I will be making significant structural changes to my code so I'll get back to you in this thread very soon whether I'm still encountering RefCell-related issues or not.

@giocri
depends on how you want to handle your graph, in my experience some of the easiest and most effective ways to do it are

struct GraphNodeTracksEdges{
  HashSet<NodeId> nodes,
  HashMap<NodeId,Vec<Nodeid>> edges,
}
struct GraphGlobalEdges{
  HashSet<NodeId> nodes,
  HashSet<(NodeId,NodeId)> edges,
}

Probably going to rewrite things to store edges like this; it's much more convenient and the benefits I thought I would get from having "intuitive" connections are actually more troublesome than anything, as nodes have no reason to know which other node they're connected to since in my case the depth-first traversal and their resulting function input are at the Graph level.

yuriy0 (new users can only mention 2 users in a post, dang)
If you need to modify more than one node at a time, you can move out of nodes (i.e. using mem::replace with a default "invalid" node state), then put the nodes back when you're done your modifications.

Instead of keeping an Rc to somehow reuse nodes, you can just use a free list. Keep a generation counter with each node.

I did not think of using mem::replace, but wouldn't the cost of initializing default nodes pile up? Could I perhaps just use an Option to wrap all nodes and take the value, leaving a None? I'm not entirely sure about the costs associated with moving values in/out of Options, which is why I thought of using a RefCell around every node to bypass the issues of single mutable access to the Vec without moving or copying any data.
Big fan of the generation counter idea though, I've even used it in prior projects but it slipped my mind.

Overall, it seems I'll be able to get rid of the RefCell around the NodeStorage, so most of my issues should vanish. The code seems like it's going to get much cleaner once some of the suggested changes are implemented.

It's not pretty nor clear and very far from efficient, but the current code is available here. I implore your mercy regarding the shortcuts taken and extremely WIP state, I mostly work on this on the tram to and from work and change core parts every other day so some things might have slipped through :')

Have you considered using petgraph, it implements various graphs and graph algorithms

mem::replace is extremely cheap for small types, it's just a memcpy. If your type is very large, you could use Option<Box<T>> instead of T, and again mem::replace becomes very cheap (since replacing on a Box is just moving the pointer). Most container types like Vec and HashMap are cheap to move, since doing so requires only to copy a pointer and a few integers. If your node type is a huge struct with 100 fields, it may be costly to move, so using Box would be worthwhile.

Using Option is the simplest way to guarantee that you have a default invalid node value for this process, but it doesn't change the cost of moving out of T, in other words, moving out of T and Option<T> have comparable cost for all types.

If you need to maximize performance, then many nested Rc<RefCell<..>> will not be your friend. Moving out of Option<Box<T>> is probably cheaper. But if you have performance concerns about either approach, profile first.

By the way, maybe you can use Ghost Cell ghost_cell - Rust too, link to the paper: https://plv.mpi-sws.org/rustbelt/ghostcell/paper.pdf . They include example together with arena, alternative may be static-rc 0.7.0 - Docs.rs .

There's also approach with cell families: cell_family - Rust

I ended by using RefCell and Rc. It works well, although many people may say - it isn't right.