I've never found Nom intuitive in general. More generally, zero-copy parsing involves a lot of lifetime-carrying structs and lifetimes in trait parameters, which is sort of diving into the deep end when you're just starting out in Rust. For example, a signature that can return something parameterized by any caller-chosen lifetime which is not an input input lifetime...
pub fn mk_parser<'a>() -> impl Parser<&'a str, ...>
...is pretty rare, but is also what you happened to need.
Below I try to explain part of what's going on. I'm not sure if it will actually be helpful to you or just distracting and more confusing. Feel free to ignore it.
Anyway, nom
works by having signatures like this:
pub trait Parser<I, O, E> {
fn parse(&mut self, input: I) -> IResult<I, O, E>;
Type parameters like I
resolve to a single type, whereas types that differ by lifetime are distinct types, even if they only differ by lifetime. So almost always you end up looking at some implementation like
<_ as Parser<&'a str, O, Error<&'a str>>>::parse(...)
for some specific lifetime 'a
. The corresponding trait bound is
<'a, T> ...
where
T: Parser<&'a str, O, Error<&'a str>>
which matches up with the return type of your function. And yes, giving the lifetimes the same name means they have to be the same, just like if you had a T: Result<Vec<A>, Option<A>>
the A
s must be the same.
Your function happens to work with any lifetime, so you can let the caller choose the lifetime. That's what this signature does:
// Returns a type that works for one particular lifetime, `'a`
// But the caller can choose any lifetime they need
pub fn mk_parser<'a>() -> impl Parser<&'a str, ...>
If you call this with different lifetimes, you get a different type out (a type that differs by lifetime).
There is actually a different type of bound that says "the single return type can work with any lifetime". It's called a higher-ranked trait bound (HRTB). The bound in this case would look like this:
<T> ...
where
T: for<'any> Parser<&'any str, O, Error<&'any str>>
// ^^^^^^^^^ a higher-ranked binder
And your function would look like this:
pub fn mk_parser() -> impl for<'any> Parser<&'any str, ...>
But if you try to change your function signature and nothing else, you'll get some probably-confusing lifetime errors. The root of the problem is that your built-up parser type contains things like this which effectively have the input type I
as a parameter.
So it's like they have an implementation like
impl<I, O, E> Parser<I, O, E> for SomeCombinator<I, O, E>
// ^ ^ ^ ^
Where SomeCombinator<&'a str, ..>
only implements Parser<&'a str, ...>
. Each concrete type of SomeCombinator
that differs by lifetime implements Parser
for that one lifetime only.
So even though you can construct a Parser
for any lifetime in mk_parser
, you're not constructing a single type that works for any lifetime. You're constructing a different type per lifetime (some type parameterized by the lifetime, or a type containing the lifetime). That's why you ended up with the somewhat odd signature where you have a caller-chosen lifetime that doesn't correspond to any inputs. The structure of the types and traits involved happen to demand it.
For the higher-ranked version, you would need something like this for your return type:
impl<'a, O> Parser<&'a str, O, Error<&'a str>> for SomethingElse<O>
// ^^^^^^^^^^^^^^^^
// This one is the same type no matter what the lifetime `'a` is
// because it's not parameterized by `'a` (or by a type containing `'a`)
Nom has this blanket implementation:
impl<'a, I, O, E, F> Parser<I, O, E> for F
where
F: FnMut(I) -> IResult<I, O, E> + 'a,
I'm not sure why they included that lifetime, I don't think it does anything.
So if we have a function that for any input &'s str
, created a parser that worked with the 's
lifetime, parsed the input, and returned the result -- that would be a function that could parse any input lifetime.
fn hr(arg: &str) -> IResult<&str, (u64, Vec<Vec<(String, u64)>>)> {
// Creates a parser for the input lifetime, parses `arg`, and
// then discards the parser. You get a different parser type for
// every lifetime, but that's fine.
mk_parser().parse(arg)
}
So long as the function item type isn't parameterized by the lifetime or a type containing the lifetime, this would be a single type that implements Parser<&'s str, _, Error<&'s str>>
for any lifetime 's
.
And it turns out the function item type isn't parameterized, so we can do this:
pub fn mk_parser_any() -> impl for<'any> Parser<&'any str, (u64, Vec<Vec<(String, u64)>>), Error<&'any str>> {
fn hr(arg: &str) -> IResult<&str, (u64, Vec<Vec<(String, u64)>>)> {
mk_parser().parse(arg)
}
hr
}
And now we have the higher-ranked version.
This is not at all something I would expect a new-comer to figure out. You can even be quite experienced in Rust and not have ran into these higher-ranked gymnastics or know how to navigate them.