Writing a recursive parser

Hi!

I'm currently writing a parser combinators library, and I'm stuck at the "recursive parsers" part.

Basically, let's say I have a parser number defined as follows:

let number = digit().repeated();

I'd like this parser to be able to recurse itself:

let number = recursive(|number| {
    digit().repeated().then(
        just(".").then(number.or_not()
    )
});

See what I mean? How can I achieve a self-reference like this, knowing that a parser is simply a struct that implements a trait (Parser)?

Thanks in advance for your help!

(Note: of course this specific parser could be written without recursion but I tried to give the most simple example I could)

AFAICT it's the easiest to do this implicitly, via a global function. That way you don't have to "support" it – which is a huge win anyway, since most real-life grammars contain a good amount of recursion. To that end, you would have your users write something along the lines of:

fn number() -> impl Parser {
    digit().repeated().then(
        just(".").then(number.or_not())
    )
}

(BTW, this definition is likely incorrect, because it allows things like "1.2.3" to parse as numbers.)

Defining a function is unfortunately not possible as the parsers definition sometimes need to access outer variables.

(And yes the parser is indeed incorrect ^^`)

I think the typical way you'd implement a context-ful parser is with free functions that accept some sort of state argument by reference. That state would then be passed down to any functions that need access to it.

Using functions makes things than closures easier because there's no need to box recursive calls and it's very obvious what state is being used by the parser.

3 Likes

I actually ended up making it work, not the simplest code in the world but here it goes:

use std::{cell::RefCell, marker::PhantomData, rc::Rc};

use crate::base::{PResult, Parser, ParserInput};

pub struct Recursive<T> {
    parser: RecursiveRef<T>,
    _t: PhantomData<T>,
}

// NOTE: This is required because of https://github.com/rust-lang/rust/issues/26925
impl<T> Clone for Recursive<T> {
    fn clone(&self) -> Self {
        Self {
            parser: self.parser.clone(),
            _t: self._t.clone(),
        }
    }
}

impl<T> Recursive<T> {
    pub fn declarative<P: Parser<T> + 'static>(decl: impl FnOnce(RecursiveRef<T>) -> P) -> Self {
        let mut rf = RecursiveRef::new();

        let parser = decl(rf.clone());

        rf.finish(Box::new(parser));

        Self {
            parser: rf,
            _t: PhantomData,
        }
    }
}

impl<T> Parser<T> for Recursive<T> {
    fn parse_inner<'a>(&self, input: &mut ParserInput<'a>) -> PResult<T> {
        self.parser.parse(input)
    }
}

pub struct RecursiveRef<T> {
    parser_ref: Rc<RefCell<Option<Box<dyn Parser<T>>>>>,
}

// NOTE: This is required because of https://github.com/rust-lang/rust/issues/26925
impl<T> Clone for RecursiveRef<T> {
    fn clone(&self) -> Self {
        Self {
            parser_ref: self.parser_ref.clone(),
        }
    }
}

impl<T> RecursiveRef<T> {
    fn new() -> Self {
        Self {
            parser_ref: Rc::new(RefCell::new(None)),
        }
    }

    fn finish(&mut self, inner: Box<dyn Parser<T>>) {
        assert!(
            self.parser_ref.borrow().is_none(),
            "Cannot replace a weak parser reference's inner value twice"
        );

        *self.parser_ref.borrow_mut() = Some(inner);
    }
}

impl<T> Parser<T> for RecursiveRef<T> {
    fn parse_inner<'a>(&self, input: &mut ParserInput<'a>) -> PResult<T> {
        self.parser_ref
            .borrow()
            .as_ref()
            .expect("Weak parser reference was not created yet :(")
            .parse(input)
    }
}

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.