Closures and lifetimes when trying to write a parser


#1

So I’ve picked a pretty difficult Rust project to toy with: parser combinators. Except, I’m not getting too far. I find it difficult to figure out how to solve the lifetime errors, and it seems like the behavior of closures has changed so rapidly in the past year that any information you find will be outdated.

use std::str::Chars;
use std::iter::Peekable;

fn main() {
    let input = "5+5";

    let parser = chr('5');
    let result = parse(&parser, &input);

    println!("{}", result);
}

type InputType<'a> = Peekable<Chars<'a>>;
type Parser<'a> = Box<Fn(&'a InputType) -> ParseResult<'a> + 'static>;

struct ParseResult<'a> {
    rest: &'a InputType<'a>
}

fn parse<'a>(parser: &Parser<'a>, input: &'a str) -> i64 {
    let iter = input.chars().peekable();
    let result = parser(&iter.clone());

    1
}

fn chr<'a>(x: char) -> Parser<'a> {
    Box::new(|input: &InputType<'a>| ParseResult { rest: input })
}

Here’s what I’m going for. I want chr to return a Parser, a closure that is, which accepts a string iterator and produces some ParseResult from that string. rest would be a string iterator that points to the beginning of the unconsumed content of the string.

The code, as it is, breaks in a milion ways. I’m confused about why I need to box my closures, what the + 'static syntax does, how I can reason about when to introduce new lifetime arguments to a function, and how to fix the existing lifetime errors. I think the real issue is that I have a hard time understanding how closures work in Rust, and the documentation out there is either too terse or outdated.

Any help on this is much appreciated!


#2

Hey Martin, I’ve actually been working on implementing a parser combinator library and when I started I ran into many similar issues. I’ve gone through several iterations to deal with these kinds of lifetime issues and other quirks of the type system. It’s been very frustrating at times but it definitely helped me understand rust better. I’d encourage you to take a look at what I’ve done because it may give you some clues.

To answer one of your questions, closures have to be boxed when returned from a function because at runtime, a closure is really a struct containing a function pointer and any captured environment data, so no two closures are alike. Fn is actually just a trait, and since traits are unsized you cannot return them from a function. But when you box a closure, you’re creating a trait object, which is basically a pointer to that struct that implements the Fn trait. Since a box is just a pointer, it is sized so you can return them from functions. I think the docs on closures and trait objects explain this much better, but it definitely took me a while to really get it.

Regarding the lifetime issues, one recommendation I would have is to make Parser a trait instead of a type alias. You want to avoid attaching a lifetime to the parser itself, since it’s the lifetime of the data being parsed you really care about. With a trait you can return a trait object that doesn’t have a lifetime parameter, but its parse method still has one.

My combinators work by basically building a tree of structs, where each struct implements a trait with the parse method. This way I can avoid most lifetime and boxing issues because I’m returning well defined structs, not closures or trait objects. I heavily use generics, which has brought its own set of problems, but as of now I’ve gotten around most of them.


#3

I played around with your example a bit and came up with this:

https://play.rust-lang.org/?gist=170651ded945cdf7d6d1&version=stable

I tried to simplify things by passing by value instead of reference. I also tweaked the result type to be generic in what kind of parsed value it returns and to allow representing parse failures.

I think the most subtle change is the higher-rank lifetime on the closure type:

type Parser<T> = Box<for<'a> Fn(InputType<'a>) -> ParseResult<'a, T>>;

The Rust book doesn’t seem to cover these yet. The simplest explanation I can give off the top of my head is that the original requires you to commit “up front” to the lifetimes of the parser’s input and output, whereas this version only requires you to commit at the point you call it. Let’s look at a simplified example:

type Early<'a> = Box<Fn(&'a str) -> &'a str>;
type Late = Box<for<'a> Fn(&'a str) -> &'a str>;

fn take_early<'a>(f: Early<'a>, s: &'a str) -> &'a str {
    f(s)
}

fn take_late<'a>(f: Late, s: &'a str) -> &'a str {
    f(s)
}

Here, both take_early and take_late seem to work fine, but what if we want to invoke the closure multiple times on strings with different lifetimes?

type Early<'a> = Box<Fn(&'a str) -> &'a str>;
type Late = Box<for<'a> Fn(&'a str) -> &'a str>;

fn take_early<'a>(f: Early<'a>, s: &'a str) -> &'a str {
    let hello: String = "hello".into();
    println!("{}", f(&hello));
    f(s)
}

fn take_late<'a>(f: Late, s: &'a str) -> &'a str {
    let hello: String = "hello".into();
    println!("{}", f(&hello));
    f(s)
}

In this case, take_late compiles fine, but the lifetime checker is upset about take_early:

<anon>:6:23: 6:28 error: `hello` does not live long enough
<anon>:6     println!("{}", f(&hello));
                               ^~~~~

This is because we committed “up front” to the closure taking and returning a string with the same lifetime as the other parameter to take_early, so we can’t call it with some other string that doesn’t live at least as long. The higher-rank lifetime lets us avoid this commitment until we actually use the closure. This gives you more flexibility in parsing and lets you avoid confusing lifetime errors down the road.


#4

Thanks for the detailed answer Dan! I feel like the problem you describe in the third paragraph is a big part of what’s holding me back, so I’ll definitely look into using traits for that. I’ll check out your library and see if I can learn something new. But I’m also starting to think that this is not really what Rust’s language design was optimized for, and that I might be aiming too high.


#5

Thanks for the concrete code bkoropoff! I’ll fiddle some more with your code. The skill of reasoning about lifetimes is becoming clearer to me, so at the very least this has been a helpful exercise.