Parsers carrying state in nom 7

joshrule · September 28, 2021, 1:38pm

I'm updating a parser from nom 4 to nom 7. I'm parsing term rewriting systems, which involves parsing a lot of first-order terms. These first-order terms can contain function symbols and constants, collectively called operators, as well as variables. The identity of each operator and variable is maintained by assigning it a numeric id, and these ids are established while parsing. The parser maintains a map between strings and ids.

In nom 4, I created a parser struct and used combinations of the method! and call_m! macros to build up the id map. In nom 7, these macros no longer exist, nor do corresponding functions.

My first attempt was to translate something like:

method!(application<Parser<'a>, CompleteStr, Term>, mut self,
        alt!(call_m!(self.standard_application) |
             call_m!(self.binary_application))
);

into

fn application(&mut self, s: &'a str) -> IResult<&'a str, Term> {
    alt((
        |x: &'a str| self.standard_application(x),
        |x: &'a str| self.binary_application(x),
    ))(s)
}

but the compiler helpfully reminds me that this requires simultaneous unique access to self in both closures.

I was hopeful that Carry state within nom parser might have an answer, but the linked code carries a fixed map; it doesn't look like anything gets updated during parsing.

What's the idiomatic way to pass mutable state around in nom 7? Do I use some sort of interior mutability trick? Is there something else I'm missing here?

Yandros · September 29, 2021, 1:32pm

Yeah, from skimming the docs, it doesn't seem like the whole parser framework is generic over a State parameter it could keep folding over, when parsing.

So all you have left are the implicit captures, but, as you mentioned, you won't be able to get hold of exclusive references if self is captured by two closures concurrently. In that case, all you have access to are shared-access-based APIs (e.g., &self-based methods), which you can offer by having Self offer interior/shared mutability. For simple values, I recommend to use finer-grained Cells, since they perform mutation without a runtime cost. For the other cases (such as a map), you'll have to use something like a RefCell, and "tank" the runtime cost of a flag guarding against concurrent mutation; there is not that much else you can do when the API is limited in that fashion.

for reference, this is why it's generally a good practice to have alt-like APIs take a state parameter (say, of type &mut State), and then give it "back" to the closures through, for instance, a &mut State closure arg:

fn application(&mut self, s: &'a str) -> IResult<&'a str, Term> {
    alt_with_state(self, (
        |x: &'a str, this /*: &mut Self */| this.standard_application(x),
        |x: &'a str, this| this.binary_application(x),
    ))(s)
}

Regarding the RefCell solution, you could do something like:

fn application(&mut self, s: &'a str) -> IResult<&'a str, Term> {
    let this = RefCell::new(self);
    alt((
        |x: &'a str| this.borrow_mut().standard_application(x),
        |x: &'a str| this.borrow_mut().binary_application(x),
    ))(s)
}

(or push the RefCell further down Self, to have {standard,binary}_application be &self-based methods)

drewkett · September 29, 2021, 1:55pm

Alternatively, you could manually implement alt for this case if need be based on the existing alt implementation. Something like this should be equivalent (untested)

use nom::error::{ParseError,ErrorKind};
fn application(&mut self, s: &'a str) -> IResult<&'a str, Term> {
    let first_err = match self.standard_application(s) {
        Err(nom::Err::Error(e)) => e,
        res => return res,
    };
    let second_err = match self.binary_application(s) {
        Err(nom::Err::Error(e)) => e,
        res => return res,
    };
    Err(nom::Err::Error(ParseError::append(s,ErrorKind::Alt,first_err.or(second_err))))
}

Yandros · September 29, 2021, 5:30pm

True! More generally, to go back to what I was saying:

fn alt2_with_state<'lt, I : 'lt, O : 'lt, State> (
    state: &'lt mut State,
    (mut p0, mut p1): (
        impl 'lt + FnMut(I, &mut State) -> IResult<I, O>,
        impl 'lt + FnMut(I, &mut State) -> IResult<I, O>,
    ),
) -> impl 'lt + FnMut(I) -> IResult<I, O>
{
    move |input: Input| {
        let first_err = match p0(input, state) {
            | Err(nom::Err::Error(e)) => e,
            | res => return res,
        };
        let second_err = match p1(input, state) {
            Err(nom::Err::Error(e)) => e,
            res => return res,
        };
        Err(nom::Err::Error(ParseError::append(s,ErrorKind::Alt,first_err.or(second_err))))
    }
}

so that you ought to then be able to do:

fn application(&mut self, s: &'a str) -> IResult<&'a str, Term> {
    alt2_with_state(self, (
        |x: &'a str, this| this.standard_application(x),
        |x: &'a str, this| this.binary_application(x),
    ))(s)
}

joshrule · September 30, 2021, 3:37am

Thanks for your replies and detailed examples.

I make use of a variety of nom's combinators, so I'll probably thread a Cell or RefCell through my parser rather than reimplement a significant chunk of nom.

I like the idea of expanding the API to allow for a mutable state object and have opened nom issue #1419 to discuss the idea further.

system · December 29, 2021, 3:38am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Carry state within nom parser help	3	1046	June 22, 2019
Nom crate, reusing locally created parsers, use of moved value error help	3	524	June 23, 2022
Composing parsers in nom help	5	724	March 21, 2022
Help with nom parsers help	4	192	February 27, 2024
How to refer to 'group' in `nom`? help	6	739	July 2, 2021

Parsers carrying state in nom 7

Related Topics