Trouble with Lifetimes not being Inferrable

Hello, I'm trying to write a basic lexer, but I'm running into trouble with the compiler not being able to infer lifetimes. Based on the error message, it sounds like I should be able to annotate the lifetimes explicitly for the compiler, but I haven't been able to to figure out a lifetime annotation that makes it happy. I'm new to rust, so maybe I'm just missing something obvious.

Error message:

error[E0495]: cannot infer an appropriate lifetime for lifetime parameter in function call due to conflicting requirements
  --> src\main.rs:69:47
   |
69 |                     return TokenType::NumLit(&tokenizer.input[start..tokenizer.position]);
   |                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: first, the lifetime cannot outlive the anonymous lifetime #1 defined on the body at 59:42...
  --> src\main.rs:59:42
   |
59 |                   '0'..='9' => tokens.push(|tokenizer: &mut Tokenizer| -> TokenType {
   |  __________________________________________^
60 | |                     let start = tokenizer.position;
61 | |
62 | |                     while tokenizer.position < tokenizer.input.len() {
...  |
69 | |                     return TokenType::NumLit(&tokenizer.input[start..tokenizer.position]);
70 | |                 }(self)),
   | |_________________^
note: ...so that reference does not outlive borrowed content
  --> src\main.rs:69:47
   |
69 |                     return TokenType::NumLit(&tokenizer.input[start..tokenizer.position]);
   |                                               ^^^^^^^^^^^^^^^
note: but, the lifetime must be valid for the anonymous lifetime defined on the method body at 50:21...
  --> src\main.rs:50:21
   |
50 |     pub fn tokenize(&mut self) -> Vec<TokenType> {
   |                     ^^^^^^^^^
note: ...so that the expression is assignable

Relevant code (with irrelevant match branches trimmed out for brevity)

 pub fn tokenize(&mut self) -> Vec<TokenType> {
        let mut tokens = Vec::new();

        while self.position < self.input.len() {
            match self.input[self.position] {
                '0'..='9' => tokens.push(|tokenizer: &mut Tokenizer| -> TokenType {
                    let start = tokenizer.position;

                    while tokenizer.position < tokenizer.input.len() {
                        let c = tokenizer.input[tokenizer.position];
                        if c.is_digit(10) || c == '.' {
                            self.position += 1;
                        }
                        break;
                    }
                    return TokenType::NumLit(tokenizer.input[start..tokenizer.position]);
                }(self)),
            }
        }

        return tokens;
    }

Tokenizer definition:

struct Tokenizer {
    input: Vec<char>,
    pub position: usize,
}

TokenType definition:

enum TokenType<'a> {
    NumLit(&'a [char]),
    Plus,
}

There's no need to use a closure in this case; you can use a simple block. Working version: Rust Playground

Why are you defining a closure and immediately calling it with self instead of just using self? When I remove that indirection, it compiles.

However, if you have some methods to do the same work as the closure did, you may indeed be running into interprocedural borrowing conflicts. Here's one way to avoid them -- I have

  • Made a free-standing function to tokenize the number, not a method
    • It takes a slice of input where the first character is part of the number
    • It doesn't need a mutable reference
  • Made a len method for the enum
  • Changed tokenize to call the function and adjust self.position afterwards

However, you may want to investigate even more general approaches, like using a String interner or just storing owned values in your TokenType, etc.

Other things I noticed but didn't pursue myself:

  • Using Vec<char> is pretty uncommon; usually people use String and &str. It can be a little verbose to check things character by character and convert this back into UTF8 (str) lengths though.
  • Your the inner while loop (inside tokenize_number in my link) unconditionally breaks; you have some sort of logic error there.

Thanks for the detailed response! I had started with a separate function, but ran into borrowing conflicts as you described. I didn't realize that blocks had values, so I thought that a closure was the only way to resolve this issue.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.