How to return a nom Parser<_, _, _> from a function?

Hi all,

I have written a very simple parser in nom (for Advent of Code day 2, if anybody is interested).

I want to invoke the parser multiple times, but I can't figure out how to return a Parser from the function. This parser actually works, in the sense that if I write the call to .parse() here and return an IResult<_, _> it works but is building the Parser from scratch on every invocation. I'd rather build the Parser once and call it every time I need to use it.

fn mk_parser<I, E>() -> Box<dyn Parser<I, (u64, Vec<Vec<(String, u64)>>), E>> {
    // Game 1: 3 blue, 4 red; 1 red, 2 green, 6 blue; 2 green

    let swap = |(a, b)| (b, a);
    let word = alpha1.map(|s: &str| s.to_string());
    let draw = terminated(u64, space1).and(word).map(swap);
    let game = separated_list1(tag(",").and(space0), draw);
    let games = separated_list1(tag(";").and(space0), game);

    Box::new(preceded(tag("Game").and(space1), u64).and(preceded(tag(":").and(space0), games)))
}

The compilation failure is

error[E0277]: the trait bound `And<impl FnMut(&str) -> Result<(&str, u64), nom::Err<_>>, impl FnMut(&str) -> Result<(&str, Vec<Vec<(String, u64)>>), nom::Err<_>>>: Parser<I, (u64, Vec<Vec<(String, u64)>>), E>` is not satisfied

which I think is trying to tell me that the I or E needs some further type bounds. I'd be happy fixing the E type to something static but I have no idea what trait bounds I need to introduce, or what a sensible static type for E would be.

How can I figure out which constraints I should be adding? And can anybody recommend a simple error type that I could be using in nom?

Hmm, my use of .to_string() on a &str may be confusing the type checker since that's supposed to be related to the I, but then replacing uses of I with a fixed &str introduces lifetime errors that are far beyond my abilities to fix.

OK, I have this...

fn mk_parser(
) -> Box<dyn Parser<&'static str, (u64, Vec<Vec<(String, u64)>>), nom::error::Error<&'static str>>>
{
    // Game 1: 3 blue, 4 red; 1 red, 2 green, 6 blue; 2 green

    let swap = |(a, b)| (b, a);
    let word = alpha1.map(|s: &str| s.to_string());
    let draw = terminated(u64, space1).and(word).map(swap);
    let game = separated_list1(tag(",").and(space0), draw);
    let games = separated_list1(tag(";").and(space0), game);

    Box::new(preceded(tag("Game").and(space1), u64).and(preceded(tag(":").and(space0), games)))
}

I guess the question now becomes, is this idiomatic in any way? The use of 'static seems really bad to me. When I try to call this with

let (_id, draw) = mk_parser().parse(line).unwrap().1;

I get

18 |             let (_id, draw) = mk_parser().parse(line).unwrap().1;
   |                               ----------------------- argument requires that `input` is borrowed for `'static`

am I hitting a general constraint with the rust borrow checker in that I can't return a borrowed object in contravariant position? Can I perhaps rewrite it so that the lifetime is at least defined on the inputs to a returned function or something like that?

I guess you may be over-constraining (and in general surely over-complicating) things. You don't need the box, you don't need the 'static, and you can eta-reduce the closure in map(|s| s.to_string()).

Accordingly, this compiles.

2 Likes

ah, brilliant! Thank you. I didn't realise that the a' could be introduced this way, I somehow thought it needed to be bound to a type variable! Does this mean that it couples the &str in the return type?

The compiler kept telling me to add the Box< dyn so I'm very pleased to be able to remove it, because I didn't understand why I would need it.

Sorry, I don't understand what you mean by "coupling the &str". The lifetime parameter is arbitrary because the parser can work for a string slice of any lifetime. The unconstrained lifetime parameter simply expresses this flexibility. There's nothing to "couple" to – which is exactly the point. The caller can choose the lifetime anything it pleases/needs whatsoever, independent from any other lifetime/type/other constraint.

I mean "coupling" in the sense that both the &str variables would have the same lifetime instead of completely independent of each other.

I've never found Nom intuitive in general. More generally, zero-copy parsing involves a lot of lifetime-carrying structs and lifetimes in trait parameters, which is sort of diving into the deep end when you're just starting out in Rust. For example, a signature that can return something parameterized by any caller-chosen lifetime which is not an input input lifetime...

pub fn mk_parser<'a>() -> impl Parser<&'a str, ...>

...is pretty rare, but is also what you happened to need.

Below I try to explain part of what's going on. I'm not sure if it will actually be helpful to you or just distracting and more confusing. Feel free to ignore it.


Anyway, nom works by having signatures like this:

pub trait Parser<I, O, E> {
    fn parse(&mut self, input: I) -> IResult<I, O, E>;

Type parameters like I resolve to a single type, whereas types that differ by lifetime are distinct types, even if they only differ by lifetime. So almost always you end up looking at some implementation like

<_ as Parser<&'a str, O, Error<&'a str>>>::parse(...)

for some specific lifetime 'a. The corresponding trait bound is

<'a, T> ...
where
    T: Parser<&'a str, O, Error<&'a str>>

which matches up with the return type of your function. And yes, giving the lifetimes the same name means they have to be the same, just like if you had a T: Result<Vec<A>, Option<A>> the As must be the same.

Your function happens to work with any lifetime, so you can let the caller choose the lifetime. That's what this signature does:

// Returns a type that works for one particular lifetime, `'a`
// But the caller can choose any lifetime they need
pub fn mk_parser<'a>() -> impl Parser<&'a str, ...>

If you call this with different lifetimes, you get a different type out (a type that differs by lifetime).


There is actually a different type of bound that says "the single return type can work with any lifetime". It's called a higher-ranked trait bound (HRTB). The bound in this case would look like this:

<T> ...
where
    T: for<'any> Parser<&'any str, O, Error<&'any str>>
    // ^^^^^^^^^ a higher-ranked binder

And your function would look like this:

pub fn mk_parser() -> impl for<'any> Parser<&'any str, ...>

But if you try to change your function signature and nothing else, you'll get some probably-confusing lifetime errors. The root of the problem is that your built-up parser type contains things like this which effectively have the input type I as a parameter.

So it's like they have an implementation like

impl<I, O, E> Parser<I, O, E> for SomeCombinator<I, O, E>
//   ^     ^                                     ^     ^

Where SomeCombinator<&'a str, ..> only implements Parser<&'a str, ...>. Each concrete type of SomeCombinator[1] that differs by lifetime implements Parser for that one lifetime only.

So even though you can construct a Parser for any lifetime in mk_parser, you're not constructing a single type that works for any lifetime. You're constructing a different type per lifetime (some type parameterized by the lifetime, or a type containing the lifetime). That's why you ended up with the somewhat odd signature where you have a caller-chosen lifetime that doesn't correspond to any inputs. The structure of the types and traits involved happen to demand it.


For the higher-ranked version, you would need something like this for your return type:

impl<'a, O> Parser<&'a str, O, Error<&'a str>> for SomethingElse<O>
//                                                 ^^^^^^^^^^^^^^^^
// This one is the same type no matter what the lifetime `'a` is
// because it's not parameterized by `'a` (or by a type containing `'a`)

Nom has this blanket implementation:

impl<'a, I, O, E, F> Parser<I, O, E> for F
where
    F: FnMut(I) -> IResult<I, O, E> + 'a,

I'm not sure why they included that lifetime, I don't think it does anything.

So if we have a function that for any input &'s str, created a parser that worked with the 's lifetime, parsed the input, and returned the result -- that would be a function that could parse any input lifetime.

fn hr(arg: &str) -> IResult<&str, (u64, Vec<Vec<(String, u64)>>)> {
    // Creates a parser for the input lifetime, parses `arg`, and
    // then discards the parser.  You get a different parser type for
    // every lifetime, but that's fine.
    mk_parser().parse(arg)
}

So long as the function item type isn't parameterized by the lifetime or a type containing the lifetime, this would be a single type that implements Parser<&'s str, _, Error<&'s str>> for any lifetime 's.

And it turns out the function item type isn't parameterized, so we can do this:

pub fn mk_parser_any() -> impl for<'any> Parser<&'any str, (u64, Vec<Vec<(String, u64)>>), Error<&'any str>> {
    fn hr(arg: &str) -> IResult<&str, (u64, Vec<Vec<(String, u64)>>)> {
        mk_parser().parse(arg)
    }
    hr
}

And now we have the higher-ranked version.[2]

This is not at all something I would expect a new-comer to figure out. You can even be quite experienced in Rust and not have ran into these higher-ranked gymnastics or know how to navigate them.


  1. i.e. with the type parameters "filled in" ↩ī¸Ž

  2. It's possible to use a closure instead of a function, but Rust's closure inference really struggles in some higher-ranked scenarios such as this; the function version has no ambiguity. ↩ī¸Ž

6 Likes

But wait, there's more! This one will be short and worth knowing, I promise.

Here's a mix of the signature you had and the solution @H2CO3 made.

pub fn mk_parser<'a>() -> Box<
    dyn Parser<
        &'a str, 
        (u64, Vec<Vec<(String, u64)>>), 
        nom::error::Error<&'a str>
    >
>

But it still errors with "returning this value requires that 'a must outlive 'static". So what's the difference?

The Box<dyn Trait> has an invisible + 'static. The non-elided version looks like this:

pub fn mk_parser<'a>() -> Box<
    dyn Parser<
        &'a str, 
        (u64, Vec<Vec<(String, u64)>>), 
        nom::error::Error<&'a str>
    >
    + 'static // :-(
>

Which is often what you want when there are no lifetimes involved with the dyn Trait, but in this case you don't want that. Changing the last 'static to 'a resolves the error.


You should stick with the -> impl Parser version, but this "'static + dyn Trait" stumbling point comes up often enough it that it's worth being aware of.

2 Likes

Well, yes, but the 'a is a red herring; the same is true if you continue to overconstrain your strings to 'static.

There has to be a relationship between the output and the error, because both borrow from the same input.

There has to be a relationship between the output and the error, because both borrow from the same input.

What would it look like if there was a much simpler error type that converts everything it needs into String? Would that even be possible with nom?

Thank you both for the detailed explanations. While I have your attention there are a few extra questions I have about it:

  • why is a &mut Parser needed and not just a &? Doesn't this imply that it is internally mutating and therefore cannot be used concurrently and potentially cannot even be reused? I expected the Parser API to be completely immutable
  • is there a way to wrap the returned value in a struct that hides much of the verbosity of the types of Parser? e.g. it could call .map_err(|e| e.to_owned()) to ensure that the error type is owned. If that struct were called Nomer then I could have a very clean API here such as fn mk_parser() -> Nomer<(u64, Vec<Vec<(String, u64)>>)>. I guess I could have a go at writing a struct and a Parser implementation for it but that feels well beyond my level right now.

UPDATE: I tried writing Nomer but it can't implement Parser because the E output is defined by the trait. But it is of course possible to write it as a standalone struct with a function named parse that looks a lot like Parser but has the owned version of E.

Well you would have String instead of &str in the Err variant, I guess?

It's trivial but requires some unfortunate boilerplate to dig through the two layers of error types.

1 Like

thank you! I am going to have to read up on what move is doing here, it's a language feature I'm not familiar with.

It moves the captured variables into the closure (as opposed to borrowing them).

1 Like

oh, I see! You're returning a lambda in the last line, which is a Parser, and the move is acting on its variables. It took me a while to read that.

I tried this same approach on a more complex parser and it quickly hit type inference problems, so I'm not sure how often I'd be able to make use of this, see

fn mk_parser<'a>(
) -> impl Parser<&'a str, (Vec<u64>, Vec<(String, Vec<Mapper>)>), nom::error::Error<&'a str>> {
    use nom::{bytes::complete::*, character::complete::*, multi::*, sequence::*};

    let seeds = preceded(tag("seeds:").and(space1), separated_list1(space1, u64));
    let range = terminated(u64, space1)
        .and(terminated(u64, space1))
        .and(u64)
        .map(|((a, b), c)| Mapper {
            destin: a,
            source: b,
            length: c,
        });

    let label = many1(satisfy(|c| c != ':')).map(|s| String::from_iter(s));
    let mapping =
        terminated(label, tag(":").and(newline)).and(separated_list1(many1(newline), range));

    terminated(seeds, many1(newline)).and(separated_list1(many1(newline), mapping))
}

That seems to compile once I add the missing definition of Mapper, so I'm not sure what to look for.

In general, if the compiler can't infer something around a closure, annotate the argument and/or the return type(s).

sorry I meant to give you the Mapper definition as well, you worked it out correctly.

I mean using your move approach to create a fully owned error type, when I did that it failed to compile. The code I gave you compiles fine for me as is, but it is returning errors when I try to convert to return an error with an owned String.

Well then you replace the _ with an explicit type in the Error and it compiles again.

There's never a fundamental reason why you couldn't solve any inference error with enough explicit type annotations.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.