How can I make the '`borrow checker` happy

kdeconinck · April 25, 2021, 6:08am

Hi all,

I'm in the process of learning Rust, and I try to build a (basic) Lexer for the .NET / C# programming language. The goal of it is to detect whether or not, multiple line terminators occurs right after each other.
Here's the solution I came up with:

// Defines the lexical tokens of the .NET / C# programming language.
#[derive(Debug, PartialEq)]
enum Token {
    // Lexical tokens with a "special" meaning.
    Identifier,
    LineTerminator,
    EndOfFile,
}

struct Lexer<'a> {
    chars: std::iter::Peekable<std::str::Chars<'a>>,
}

// Defines the "basic" implementation of the `Lexer` struct.
impl<'a> Lexer<'a> {
    // Constants which defines the unicode code points of specific characters.
    const UNICODE_OF_CARRIAGE_RETURN_CHAR: char = '\u{000D}';
    const UNICODE_OF_LINE_FEED_CHAR: char = '\u{000A}';
    const UNICODE_OF_NEXT_LINE_CHAR: char = '\u{0085}';
    const UNICODE_OF_LINE_SEPARATOR_CHAR: char = '\u{2028}';
    const UNICODE_OF_PARAGRAPH_SEPARATOR_CHAR: char = '\u{2029}';

    // Initializes a new instance of the `Lexer` struct which operates on `input`.
    fn new(input: &'a str) -> Self {
        Self {
            chars: input.chars().peekable(),
        }
    }

    // Process the input and returns the next token.
    fn next_token(&mut self) -> Token {
        match self.chars.next() {
            Some(ch) => match ch {
                Lexer::UNICODE_OF_CARRIAGE_RETURN_CHAR => match self.chars.peek() {
                    Some(&Lexer::UNICODE_OF_LINE_FEED_CHAR) => {
                        self.chars.next();
                        Token::LineTerminator
                    }
                    _ => Token::LineTerminator,
                },
                Lexer::UNICODE_OF_LINE_FEED_CHAR
                | Lexer::UNICODE_OF_NEXT_LINE_CHAR
                | Lexer::UNICODE_OF_LINE_SEPARATOR_CHAR
                | Lexer::UNICODE_OF_PARAGRAPH_SEPARATOR_CHAR => Token::LineTerminator,
                _ => Token::Identifier,
            },
            None => Token::EndOfFile,
        }
    }
}

Right now, this is already working. The next thing I tried was extending the Token enumeration to contain the position where this token was found.
This has been done by adding a Position struct:

// Defines a position (row, column) in a .NET / C# source code file.
#[derive(Debug, PartialEq)]
struct Position {
    row: u16,
    column: u16,
}

Next, I extended the Lexer struct to include this Position.

struct Lexer<'a> {
    chars: std::iter::Peekable<std::str::Chars<'a>>,
    pos: Position,
}

And finally, I updated the Lexer to include the position when fetching the next token:

// Defines the "basic" implementation of the `Lexer` struct.
impl<'a> Lexer<'a> {
    // Constants which defines the unicode code points of specific characters.
    const UNICODE_OF_CARRIAGE_RETURN_CHAR: char = '\u{000D}';
    const UNICODE_OF_LINE_FEED_CHAR: char = '\u{000A}';
    const UNICODE_OF_NEXT_LINE_CHAR: char = '\u{0085}';
    const UNICODE_OF_LINE_SEPARATOR_CHAR: char = '\u{2028}';
    const UNICODE_OF_PARAGRAPH_SEPARATOR_CHAR: char = '\u{2029}';

    // Initializes a new instance of the `Lexer` struct which operates on `input`.
    fn new(input: &'a str) -> Self {
        Self {
            chars: input.chars().peekable(),
            pos: Position { row: 0, column: 0 },
        }
    }

    // Process the input and returns the next token.
    fn next_token(&mut self) -> Token {
        match self.chars.next() {
            Some(ch) => match ch {
                Lexer::UNICODE_OF_CARRIAGE_RETURN_CHAR => match self.chars.peek() {
                    Some(&Lexer::UNICODE_OF_LINE_FEED_CHAR) => {
                        self.chars.next();
                        Token::LineTerminator(self.pos)
                    }
                    _ => Token::LineTerminator(self.pos),
                },
                Lexer::UNICODE_OF_LINE_FEED_CHAR
                | Lexer::UNICODE_OF_NEXT_LINE_CHAR
                | Lexer::UNICODE_OF_LINE_SEPARATOR_CHAR
                | Lexer::UNICODE_OF_PARAGRAPH_SEPARATOR_CHAR => Token::LineTerminator(self.pos),
                _ => Token::Identifier(self.pos),
            },
            None => Token::EndOfFile(self.pos),
        }
    }
}

And there the adventure with the borrow checker does begin:
Here's one of the errors that are returned when running cargo check:

So, CARGO advices to implement the Copy trait on the Position struct, which is easy enough:

// Defines a position (row, column) in a .NET / C# source code file.
#[derive(Debug, PartialEq, Clone, Copy)]
struct Position {
    row: u16,
    column: u16,
}

But it feels like it's not the Rust way of doing things.
I make a copy of a struct, which only contains 2 u16 fields, so I don't expect an issue here, but it feels like there should be another way, without the requirement to copy the Position struct.

Any advice?

2e71828 · April 25, 2021, 6:21am

Copying Position is probably required here: If you tried to return a reference, then every token would point to the same Position instance, and it will only be correct until the next token is generated. If you’d prefer, you can implement Clone instead of Copy and use an explicit position.clone() call to make the copy.

The alternative is to provide a current_position method separate from next_token, which could return a reference to the internal position. In this case, though, copies are probably cheaper: a pointer is twice the size of an instance of Position (on most platforms).

kdeconinck · April 25, 2021, 6:31am

Thanks for the quick reply.
Seems like I will have to use a copy

I marked your answer as the Solution.

system · July 24, 2021, 6:32am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
An suggestions/improvements for my lexer? help	4	1295	January 12, 2023
Beginner question - cannot borrow as mutable more than once help	9	1591	August 14, 2020
I've painted myself into a borrowchecking hole. How do I learn the right way to use Rust?	5	606	January 12, 2023
Managing memory is kicking my butt help	3	1006	January 12, 2023
How can I improve this lexer? help	7	3279	January 12, 2023

How can I make the '`borrow checker` happy

Related topics