Use a HashMap to look up a struct's own mutable methods

Hello, I'm writing a Parser for a programming language (this is for practice, I know parser generators exist haha)

Here's a simplified version of my parser:

type ParseFn<'b, T> = dyn Fn(&mut T) -> Expression<'b>;

struct Parser<'a, 'b: 'a> {
    l: &'a mut Lexer<'b>,
    errors: Vec<String>,
    curr_token: Option<Token<'b>>,
    peek_token: Option<Token<'b>>,
    prefix_parse_fns: HashMap<TokenType, Box<ParseFn<'b, Parser<'a, 'b>>>>,
    infix_parse_fns: HashMap<TokenType, Box<ParseFn<'b, Parser<'a, 'b>>>>,
}

In the new() function of my Parser, I'm registering a function which is one of the parser's own methods:

impl<'a, 'b: 'a> Parser<'a, 'b> {
    fn new(l: &'a mut Lexer<'b>) -> Parser<'a, 'b> {
        let curr_token = Some(l.next_token());
        let peek_token = Some(l.next_token());
        let mut p = Parser {
            l,
            errors: Vec::new(),
            curr_token,
            peek_token,
            prefix_parse_fns: Default::default(),
            infix_parse_fns: Default::default(),
        };
        p.register_prefix(IDENT, Box::new(&Parser::parse_identifier));
        p
    }

    fn register_prefix(&mut self, token: TokenType, func: Box<ParseFn<'b, Parser<'a, 'b>>>) {
        self.prefix_parse_fns.insert(token, func);
    }

    fn parse_identifier(&mut self) -> Expression<'b> {
        Identifier(self.curr_token.take().unwrap())
    }
// other code

In one of my functions I decide to parse the next statement based on the current token that I'm reading:

    fn parse_statement(&mut self) -> Option<Statement<'b>> {
        match self.curr_token.as_ref().unwrap().token_type {
            TokenType::LET => self.parse_let_statement(),
            TokenType::RETURN => self.parse_return_statement(),
            _ => self.parse_expression_statement(),
        }
    }

    fn parse_expression_statement(&mut self) -> Option<Statement<'b>> {
        let expression = self.parse_expression(Precedence::Lowest);
        if expression.is_none() {
            return None;
        }
        let expression = expression.unwrap();
        if self.peek_token_is(SEMICOLON) {
            self.next_token();
        }
        return Some(ExpressionStatement { expression });
    }

    fn parse_expression(&mut self, precedence: Precedence) -> Option<Expression<'b>> {
        let curr_type = &self.curr_token.as_ref().unwrap().token_type;
        let prefix_parser = self
            .prefix_parse_fns
            .get(curr_type)
            .unwrap()
            .as_ref();
        Some(prefix_parser(self))
    }

And the parse_identifier function is defined above underneath the new() definition.

I'm currently getting this error:

error[E0502]: cannot borrow `*self` as mutable because it is also borrowed as immutable
  --> parser/src/parser.rs:81:14
   |
76 |           let prefix_parser = self
   |  _____________________________-
77 | |             .prefix_parse_fns
78 | |             .get(curr_type)
   | |___________________________- immutable borrow occurs here
...
81 |           Some(prefix_parser(self))
   |                -------------^^^^^^
   |                |
   |                mutable borrow occurs here
   |                immutable borrow later used by call

How can I write this in a rust-y way? I'm following along with a book that's implemented this in Go, but I'm trying to capture the same behaviour in Rust.

Thanks in advance,
Arshan

is it necessary for you to be using dynamic dispatch in this case? usually with parsers you can define them in code, instead you're making a program that creates the parser at runtime.

are you following any algorithm for your parser or are you just winging it? it would be useful if you could explain the top level architecture of your parser and the motivations behind it's design (this could be an xy problem)

i've written parsers before and didn't have to deal with dynamic dispatching on parser functions. one of my parsers used generics, so it could be similar to your example, i think generics could be better than trait objects since they encode the logic in the type-level

1 Like

If you need to store methods only, you could use function pointers instead of trait objects; those are copiable:

type ParseFn<'b, T> = fn(&mut T) -> Expression<'b>;

and then simply store ParseFn instead of Box<ParseFn> in the maps.

2 Likes

Since we're talking Fn here, you could use an Arc<ParseFn<'b, Self>> (or Rc) instead of Box, if function pointers aren't sufficient.

        self.prefix_parse_fns
            .get(&curr_type)
            .map(|arc_ref| arc_ref.clone())
            .map(|prefix_parser| prefix_parser(self))

If you can't copy or clone your closures, you'll have to remove it from the hash map so that you're not borrowing yourself (to hold on to the closure reference) while simultaneously passing a &mut self.

    pub fn foo(&mut self, curr_type: TokenType) -> Option<Expression<'b>> {
        self.prefix_parse_fns
            .remove(&curr_type)
            .map(|prefix_parser| {
                let res = prefix_parser(self);
                self.prefix_parse_fns.insert(curr_type, prefix_parser);
                res
            })
    }

But this is probably not workable as I imagine you have expressions which may have subexpressions for example (i.e. the prefix_parser needs prefix_parse_fns to be intact).

1 Like

It's supposed to be a pratt parser, however I'm too early in the book so implementation might not be fully fleshed out yet.

You're correct re: dynamic dispatching. I could just use a match statement and statically dispatch to different functions within my struct. I don't really know why the book is using this implementation. Maybe in the future he wants to change the parser logic on the fly? Idk...

I think you're correct in that I wouldn't really need dynamic dispatching though... I personally don't see why my parser would want to accept new behaviour at runtime... The book is written in Go and I've just decided to do this in Rust to learn Rust, and it's been amazing because it has me thinking about all these small decisions where in JS/Python/Java/Etc. I wouldn't even think about them...

But that actually makes me itch a bit... What if I do want to have an object whose behaviour is modifiable at runtime? Then what?

I'd want to still know what to do in that case...

Thank you for raising these questions. I'm learning a lot.

oh interesting, will give this a try, thanks!

But what is exactly going wrong here when I use Box? I want to understand what ownership rules are getting violated here...

You can use let prefix_parser: () = ... to have the compiler tell you the exact type here, but let me try to track it syntactically.

        let prefix_parser = self  // &'x mut Self = &'x mut Parser<'a, 'b>
            .prefix_parse_fns     // (&'y) HashMap<ToTy, Box<dyn Fn -> E<'b>>
            .get(curr_type)       // Option<&'y Box<dyn Fn -> E<'b>>
            .unwrap()             // &'y Box<dyn Fn -> E<'b>
            .as_ref();            // &'y dyn Fn -> E<'b>

As you can see, it's a reference -- a reference to something inside of self.prefix_parse_fns which is inside of self. Trying to use the self: &mut Self while this borrow is still active is going to cause a conflict.

Maybe this is my dyn Fn(Self) -> E<'b> for example:

|parser| { parser.prefix_parse_fns.clear(); }

That would cause a dangling pointer. More fundamentally though, having a &mut observable by another borrow is instance UB in Rust.

So prefix_parser and self can't be usable at the same time, and there's no way to pass self to the prefix_parser as a Fn.

The solutions all involve being able to stop borrowing once you find the closure in some way:

  • Copy it out instead of borrowing in place
  • Clone it out instead of borrowing in place
  • Move it out instead of borrowing in place (and then put it back)
1 Like

The Go book probably uses a list of functions because the most common way to talk about Pratt parsing is with some Map<Token, (Precedence, Parser)>.

For a more Rust-friendly take on structuring Pratt parsing, I always like to reference Matklad's Simple but Powerful Pratt Parsing
Apr 13, 2020
and From Pratt to Dijkstra.

The short version is not to use an actual dynamic map, but to just write out the map as a match. Using a map is only necessary if you're interpreting some set of dynamic rules. Some people say that doing so better teaches the pure algorithm; I say the extra indirection gets in the way.

If you do want to make it more dynamic, the typical way to structure it for Rust would be to separate out the "language" definition (the maps) from the parser context. Then you have Language::parse_xxx(&self, &mut Parser) etc.

2 Likes

It has nothing to do with box.

The problem is that borrowing an element of a container contains the entire container, and in general calling a borrowing (&self or &mut self) method borrows the entire receiver.

Thus, when you try to get the function out of the map and then call a mutating method on self, then borrowing the function borrows the map, but calling the mutating method would mutably borrow all of self, hence no other borrow into self is allowed.

When you use a function pointer, you can just copy it out of the map, without borrowing it for as long as you keep the function pointer itself around. (You couldn't do it with a trait object, or with a regular generic closure type that captured non-cloneable data.)

Likely because Go has no good/sophisticated pattern matching. In this case, Go's simplistic switch statement could be used IIUC, but it's probably not as widely-used as pattern matching in Rust, so the author likely reached for the dynamic dispatch alternative instinctively.

1 Like

Thank you so much for this beautiful explanation, I've read the rust book multiple times already and some concepts don't click until you actually start getting your hands dirty with implementing something. Greatly appreciated :pray:t3:

This is an awesome resource! Thank you so much for sharing. More content for me to nerd out on.

Yeah makes sense, in this case just using fn solved my problem. However I think I'm going to use a match statement anyways.