Lost in generic inference and lifetimes with Nom, and it's only a 6-statement function!

In trying to learn Rust a bit more, I'm working through Advent of Code 2020.

On Day 2, there's a simple parsing task for reading in a text file with lines formatted a particular way, and then doing some operations on each line. Instead of hand-writing yet another bad parser, I figured I'd try to learn the ecosystem a bit more and gave Nom a try.

Since the parsing is within the file-reading code, I figured it'd make sense for it to return std::io:Error like the file-reading code that calls this method

This code works:

use nom::character::complete::{alpha1, anychar, char, u64, i64, multispace0, multispace1};
use nom::sequence::tuple;

// Parse "<number>-<number> <char>: <string>"
fn parse_password_line(i: &str) -> Result<(u64, u64, char, &str), io::Error> {
    let parse = tuple::<&str, _, (&str, nom::error::ErrorKind), _>((
        u64,
        char('-'),
        u64,
        multispace1,
        anychar,
        char(':'),
        multispace0,
        alpha1,
    ))(i);

    match parse {
        Ok(parse) => {
            let parts = parse.1;
            Ok((parts.0, parts.2, parts.4, parts.7))
        }
        Err(e) => Err(io::Error::new(io::ErrorKind::InvalidData, e.to_string())),
    }
}

But there are two things I can't figure out. First, having to include the generic info for tuple; changing it to just:

let parse = tuple(( ...

gives:

error[E0283]: type annotations needed for `Result<(&str, (u64, char, u64, &str, char, char, &str, &str)), nom::Err<Error>>`
   --> src/day02.rs:55:17
    |
55  |     let parse = tuple((
    |         -----   ^^^^^ cannot infer type for type parameter `E` declared on the function `tuple`
    |         |
    |         consider giving `parse` the explicit type `Result<(&str, (u64, char, u64, &str, char, char, &str, &str)), nom::Err<Error>>`, where the type parameter `E` is specified
    |
    = note: cannot satisfy `_: ParseError<&str>`
note: required by a bound in `tuple`
   --> /Users/quantumet/.cargo/registry/src/github.com-1ecc6299db9ec823/nom-7.1.0/src/sequence/mod.rs:266:23
    |
266 | pub fn tuple<I, O, E: ParseError<I>, List: Tuple<I, O, E>>(
    |                       ^^^^^^^^^^^^^ required by this bound in `tuple`
help: consider specifying the type arguments in the function call
    |
55  |     let parse = tuple::<I, O, E, List>((
    |                      +++++++++++++++++

and it's very non-obvious to me from the Nom docs on how to tell what E should be - it should implement ParseError, and reading its docs shows there's an impl for (I, ErrorKind) so I stuck that in and it seems to work. But why did I need to specify it, and what other choices were there? The Nom docs on tuple don't specify this, I presume because one of the assert_eq! enables complete type inference there.

And perhaps the choice there leads to the second problem. I was trying to chain the error causes by passing in e to the io::Error::new method by using just:

        Err(e) => Err(io::Error::new(io::ErrorKind::InvalidData, e)),

but that gives:

error[E0621]: explicit lifetime required in the type of `i`
  --> src/day02.rs:70:23
   |
70 |         Err(e) => Err(io::Error::new(io::ErrorKind::InvalidData, e)),
   |                       ^^^^^^^^^^^^^^ lifetime `'static` required

And it's really not at all clear to me where 'static could even be applied in these 6 lines of code! Which makes me suspect adding 'static isn't actually the right thing to do, but I'm not sure what would be.

Forcing the Nom error to a string fixes that error, but that feels like I'm working around something I should actually understand better, and probably reduces the amount of debug info I could use higher up in the program (if this was a large program, instead of an exercise).

So I guess the general question is, how do you navigate the API of a large crate like Nom to understand what to plug into generic parameters like these, when the types are clearly interrelated, and the compiler errors don't help much?

There are also implementations for nom::error::Error, nom::error::VerboseError and (). You can try replacing (&str, nom::error::ErrorKind) with any of these and they will all work, so the compiler needs help in figuring out which one you want. (It would also need such a hint even if there was only one implementation, because it wouldn't be nice if adding a second implementation broke code relying on there only being one.)

I don't know if nom has some more convenient way of selecting the concrete error type, but one option is to annotate the variable instead of the function

let parse: Result<_, nom::Err<(_, _)>> = tuple(( // ...

The 'static error is a bit unhelpful as is often the case, in this case it's trying to tell you that instead of a temporary str borrow, you should use a permanent 'static one, meaning changing the function signature to

fn parse_password_line(i: &'static str) -> Result<(u64, u64, char, &str), io::Error> {

which, as you've correctly guessed, would probably result in unsolveable errors elsewhere.

The reason it tries suggesting this is a bit convoluted, but to simplify it a bit, std::io::Error::new wants something that can be converted into a trait object that does not hold any temporary borrows. nom's errors contain the unparsed input data, which means that this conversion cannot be done when the input data is temporarily borrowed from elsewhere (i: &str). The naive fix is to simply make the borrow permanent (i: &'static str).

Using to_string like you have is fine, I think, but if you want to keep the nom::Err, it offers a function nom::Err::map_input, which we can use to turn the problematic temporarily borrowed input into an owned String.

        Err(e) => Err(io::Error::new(
            io::ErrorKind::InvalidData,
            e.map_input(|e| e.to_string()),
        )),

APIs that use lots of generics like nom's are a bit hard to wrap your head around and I don't have great advice for that, unfortunately. One good trick is to type annotate something wrong to get a compiler error telling you what it actually is. Examples are often valuable too, here you can see that they've made the parsing function parse_string generic over the error type, and it is then called with the error type annotated.

1 Like

Thanks for the detailed response - it's quite helpful!

I think additional confusion here for me comes from the fact that there's a nom::Err enum, which then contains a generic E in the Error case that's the same E that the parsers like tuple take for their generic, which could then specifically be one of the nom::error::* options that implements ParseError. Just takes a few minutes of thinking through 'why are there two error things here'.

The reason it tries suggesting this is a bit convoluted, but to simplify it a bit, std::io::Error::new wants something that can be converted into a trait object that does not hold any temporary borrows.

Do you happen to have a link to something that might cover the 'bit convoluted' part?

I'd like to dig into this a bit to get a better sense of what indicates the 'wants a trait object that does not hold temporary borrows'. Error::new requires a E: Into<Box<dyn Error + Send + Sync>>. Looking through the dyn docs, it's not obvious to me if object safety might be involved here; it certainly doesn't seem like Send/Sync matter here.

I found that nom has the Finish trait which "unwraps" the nom::Err wrapper. I don't think it's useful here (could be wrong though), so I didn't mention it, but it might be useful in the future.

As for the convoluted bit, dyn without a specified lifetime implies + 'static in bounds (see reference), so

where E: Into<Box<dyn Error + Send + Sync>>

is really

where E: Into<Box<dyn Error + Send + Sync + 'static>>

When we try to use the nom error as the E, first we find that Into has this impl

impl<T, U> Into<U> for T where
    U: From<T>

and From has this impl

impl<'a, E: Error + Send + Sync + 'a> From<E> for Box<dyn Error + Send + Sync + 'a>

In order to get a Box<dyn Error + Send + Sync + 'static> using this impl, we need an E: Error + Send + Sync + 'static, but we have an E: Error + Send + Sync + 'a. So the compiler says, make 'a = 'static to fix the error (&'a str => &'static str).

I believe these are the relevant impls at least.

1 Like

Great, thanks for the extra details.

I had tried using e.clone() instead of e.to_string(), to no success, but that's clearly because it just cloned the reference to the input str instead of deep-cloning that as well. And I need to remember the lifetime elision rules so I can understand quicker that the lifetime of i is assigned to the output of the function generated by tuple(), including the error.

Very much appreciate the explanations!

Ran into a similar situation recently, and only just worked it out. The Rosetta stone for me was the Nom error handling page, specifically this

If we used a borrowed type as input, like &[u8] or &str , we might want to convert it to an owned type to transmit it somewhere, with the to_owned() method:

Here's the commit where I ripped out all the bogus lifetimes I had used to hack around this and just called .to_owned() on the error response, almost exactly as the docs advise.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.