Any advice on building safe self-referential structs

Currently building a lazy graph iterator.

type Key = Vec<usize>; 
type Value = (bool, HashMap<Rule,Key> );    // valid rules -> next states


struct LazyDfaa<'a>{
    reference: &'a NFA,
    cache : HashMap<Key, Value>,
    step  : Option <Key>
}

I'm aware of the "indexing method"
But since this is small project (nothing professionnal), i was wondering more efficient ways (safe and unsafe) to do it.

Advice, documentation, guidance is welcomed

1 Like

If you are OK with read-only structs, use Rc + Weak: Rc in std::rc - Rust (this is a classical approach used in many RC-based languages)

Otherwise, try Ouroboros GitHub - someguynamedjosh/ouroboros: Easy self-referential struct generation for Rust.

You might also solve it with Rc+ RefCel effectively moving checks to runtime.

1 Like

thank you for your contribution,
While i was searching I saw Pin, do you think using it would be a viable methode, or am i misunderstanding its utility.

Pin may be useful if you try to roll your own using pointers. But you would still need unsafe for construction; it doesn't address that part. There are no safe ways to build something in Rust that is both self-referential and useful using simple pointers or references.

Actual references are not the right tool for the job, with or without Pin (if you roll your own and don't want smart pointers, use raw pointers).

I would suggest ouroboros or yoke over rolling your own, they're at least somewhat battle tested. (Even those dedicated crates have had and may still have soundness holes. It's not an easy problem.)

The Pin type is useful in much narrower situations than it may seem. Pin is useful when all 3 of these conditions are met:

  1. You have some code that wants to create a self-referential data structure (in most cases, this is an async block which contains variables that borrow other variables in the same block), and not internally perform the memory allocation for it.
  2. You have some code that wants to generically allocate a place to store that data structure and others of its kind (e.g. an async executor’s set of tasks it is executing).
  3. These are not the same code, and there may even be intermediaries (wrapper types, i.e. more async blocks) between (1) and (2).

Pin is a way for (2) to communicate to (1) a statically checked promise, that may be relied on by unsafe code, that the data structure won't be moved. If (1) and (2) are part of the same codebase, then they don’t need Pin.

2 Likes

If you're building graphs because you want to actually use a graph for something else and not because you're trying to learn how to build graph data-structures, I highly recommend using petgraph, it has a bunch of graph types as well as a pile of useful algorithms on graphs such as walking the graph, finding all the cycles, computing a topological ordering, and many more.

That said, if you're trying to figure out how to build your own graph data-structures, one other option to consider is to use arenas, which lets you create &'ctx MyType everytime you need a new MyType in your self-referential datastructure, though they have some downsides such as you can't move the arenas, and you can't deallocate anything in your datastructure until you drop the arenas which deallocates everything at once.
This code on the playground

use std::cell::RefCell;
use typed_arena::Arena;

/// you'll need a context struct that holds all your arenas,
/// all the rest of your graph will borrow from the context struct.
#[derive(Default)]
struct Context<'ctx> {
    graph_arena: Arena<Graph<'ctx>>,
    node_arena: Arena<Node<'ctx>>,
    edge_arena: Arena<Edge<'ctx>>,
}

struct Node<'ctx> {
    outgoing_edges: RefCell<Vec<&'ctx Edge<'ctx>>>,
    incoming_edges: RefCell<Vec<&'ctx Edge<'ctx>>>,
    data: RefCell<String>,
}

struct Edge<'ctx> {
    start_node: &'ctx Node<'ctx>,
    end_node: &'ctx Node<'ctx>,
    data: RefCell<String>,
}

struct Graph<'ctx> {
    nodes: RefCell<Vec<&'ctx Node<'ctx>>>,
    ctx: &'ctx Context<'ctx>,
}

impl<'ctx> Graph<'ctx> {
    fn new(ctx: &'ctx Context<'ctx>) -> &'ctx Graph<'ctx> {
        ctx.graph_arena.alloc(Graph {
            nodes: RefCell::new(vec![]),
            ctx,
        })
    }
    fn add_node(&self, data: String) -> &'ctx Node<'ctx> {
        let node = self.ctx.node_arena.alloc(Node {
            outgoing_edges: RefCell::new(vec![]),
            incoming_edges: RefCell::new(vec![]),
            data: RefCell::new(data),
        });
        self.nodes.borrow_mut().push(node);
        node
    }
    fn add_edge(
        &self,
        start_node: &'ctx Node<'ctx>,
        end_node: &'ctx Node<'ctx>,
        data: String,
    ) -> &'ctx Edge<'ctx> {
        let edge = self.ctx.edge_arena.alloc(Edge {
            start_node,
            end_node,
            data: RefCell::new(data),
        });
        start_node.outgoing_edges.borrow_mut().push(edge);
        end_node.incoming_edges.borrow_mut().push(edge);
        edge
    }
    // we can't just use #[derive(Debug)] since that would be infinite recursion
    fn dump(&self) {
        println!("Graph:");
        for Node {
            incoming_edges: _,
            outgoing_edges,
            data,
        } in self.nodes.borrow().iter()
        {
            println!("Node: {:?}", data.borrow());
            for Edge {
                start_node,
                end_node,
                data: edge_data,
            } in outgoing_edges.borrow().iter()
            {
                println!("    Edge: {:?}", edge_data.borrow());
                println!(
                    "        {:?} -> {:?}",
                    start_node.data.borrow(),
                    end_node.data.borrow(),
                );
            }
        }
    }
}

fn demo<'ctx>(ctx: &'ctx Context<'ctx>) {
    let graph = Graph::new(ctx);
    let foo = graph.add_node("foo".into());
    let bar = graph.add_node("bar".into());
    let baz = graph.add_node("baz".into());
    graph.add_edge(foo, foo, "self-loop".into());
    graph.add_edge(foo, bar, "foo to bar".into());
    graph.add_edge(baz, bar, "baz to bar".into());
    graph.add_edge(bar, baz, "bar to baz".into());
    graph.dump();
}

fn main() {
    let ctx = Context::default();
    demo(&ctx);
}

From looking at the replies i'm considering trying out every possible way for fun and self-learning purposes.

When you say it's not an easy can expand as to why it isn't. If i'm building my own unsafe implementation, what should i be looking out for.

I'll definitely look it out for any other project that requires graph like structures.
In my case the graph is already created.

I use Thompson construction algorithm + Recursive descent parsing to build the graph from a string. (There's highly likely UB in this implementation, feel free to point it out)

The recusive descent parser impl

pub struct Parser<'a, T: Graph> {
    buffer: Option<char>,
    graph: T,
    iter : std::iter::Peekable<std::str::Chars<'a>>,
}

impl<'a, T:Graph> Iterator for Parser<'a, T> {
    type Item = char;

    fn next(&mut self)->Option<Self::Item> {
        self.buffer.take().or_else(||self.iter.next())
    }
}


// abc (bb)+ | def
// abc bb || abcdef
impl<'a, T: Graph> Parser<'a, T> {
//------------------------logic------------------------//
    fn alternation(&mut self)->Vec<Frag> {
        let mut stack = vec![];

        self.concatenation(&mut stack);
        loop {
            let Some(c) = self.peek() else {return stack};
            match c {
                '|' =>{
                    self.next();
                    
                    stack.extend(self.alternation());
         
                    let Some(e2) = stack.pop() else {panic!("error 1")};
                    let Some(e1) = stack.pop() else {panic!("error 2")};
                    stack.push(self.graph.alternation(e1, e2));
                    break;
                    },
                ')' =>{return stack;}
                 _  => break,
            }
        }
        stack
    }

    fn concatenation(&mut self,stack:&mut Vec<Frag>) {           
        self.operande(stack);

        loop { 
            if stack.len()>=2 {
                let Some(e2) = stack.pop() else {panic!("not enough in stack to concat")};
                let Some(e1) = stack.pop() else {panic!("not enough in stack to concat")};
                stack.push(self.graph.concatenation(e1,e2));         
            } else {
                break;
            }
        }
    }

    fn operande(&mut self, stack:&mut Vec<Frag>) {
        loop { 

            let Some(c) = self.peek() else {break};

            match c {
                '|'=>{
                    break;
                }
                ')'=>{
                    return;
                }
                '*'=>{
                    self.next();
                    let Some(e1) = stack.pop() else {panic!("not enough in stack to concat")};
                    stack.push(self.graph.zero_or_more(e1));
                }
                '+'=>{
                    self.next();
                    let Some(e1) = stack.pop() else {panic!("not enough in stack to concat")};

                    stack.push(self.graph.one_or_more(e1));
                }
                '?'=>{
                    self.next();
                    let Some(e1) = stack.pop() else {panic!("not enough in stack to concat")};

                    stack.push(self.graph.one_or_zero(e1));
                }
                _      =>{
                    self.literal(stack);
                }
            }
        }

    }
    fn literal(&mut self, stack: &mut Vec<Frag>) {

        let Some(c)=self.next() else {return};
        match c {
            '\\'=> {
                let p = match self.peek() {
                    Some(c@ ('('|')'|'{'|'}'|'*'))=> *c,
                    _ =>{
                        stack.push(self.graph.literal(c));
                        return
                    }
                };
                self.next();
                stack.push(self.graph.literal(p));
            }

            c @ ('a'..='z'|
                 'A'..='Z'|
                 '1'..='9'|
                 '/' | '.' 
                 )=> {
                stack.push(self.graph.literal(c));
            }

            '('=> {
                stack.extend(self.alternation());
                match self.next() {
                    Some(')') => return,
                    _ => panic!("missing closing parenthesis"),
                }
            }

            c @ _=>{println!("Bad token{:?}",c);panic!()},
        }
    }


//----------------------helper-----------------------//
    pub fn peek(&mut self)-> Option<&char> {
        self.buffer.as_ref().or_else(||self.iter.peek()) 
    }

//-----------------------init------------------------//
    pub fn new(s: &'a str, graph:T)->Self {
        let iter = s.chars().peekable();
        Parser {graph: graph, iter: iter, buffer:None}
    }

    //return finish graph
    pub fn parse(mut self)->T {
        let mut frag = self.alternation();
        let Some(e) = frag.pop() else {panic!("couldnt parse it")};
        self.graph.finish(e)
    }

    //returns unfinish graph, no match will be made if the 
    pub fn get_frag(mut self)->T {
        let _ = self.alternation();
        self.graph
    }
}
1 Like

You can't get UB's in safe Rust, so you should be okay.