Use a HashMap to look up a struct's own mutable methods

ArshanKhanifar · January 19, 2023, 9:39pm

Hello, I'm writing a Parser for a programming language (this is for practice, I know parser generators exist haha)

Here's a simplified version of my parser:

type ParseFn<'b, T> = dyn Fn(&mut T) -> Expression<'b>;

struct Parser<'a, 'b: 'a> {
    l: &'a mut Lexer<'b>,
    errors: Vec<String>,
    curr_token: Option<Token<'b>>,
    peek_token: Option<Token<'b>>,
    prefix_parse_fns: HashMap<TokenType, Box<ParseFn<'b, Parser<'a, 'b>>>>,
    infix_parse_fns: HashMap<TokenType, Box<ParseFn<'b, Parser<'a, 'b>>>>,
}

In the new() function of my Parser, I'm registering a function which is one of the parser's own methods:

impl<'a, 'b: 'a> Parser<'a, 'b> {
    fn new(l: &'a mut Lexer<'b>) -> Parser<'a, 'b> {
        let curr_token = Some(l.next_token());
        let peek_token = Some(l.next_token());
        let mut p = Parser {
            l,
            errors: Vec::new(),
            curr_token,
            peek_token,
            prefix_parse_fns: Default::default(),
            infix_parse_fns: Default::default(),
        };
        p.register_prefix(IDENT, Box::new(&Parser::parse_identifier));
        p
    }

    fn register_prefix(&mut self, token: TokenType, func: Box<ParseFn<'b, Parser<'a, 'b>>>) {
        self.prefix_parse_fns.insert(token, func);
    }

    fn parse_identifier(&mut self) -> Expression<'b> {
        Identifier(self.curr_token.take().unwrap())
    }
// other code

In one of my functions I decide to parse the next statement based on the current token that I'm reading:

    fn parse_statement(&mut self) -> Option<Statement<'b>> {
        match self.curr_token.as_ref().unwrap().token_type {
            TokenType::LET => self.parse_let_statement(),
            TokenType::RETURN => self.parse_return_statement(),
            _ => self.parse_expression_statement(),
        }
    }

    fn parse_expression_statement(&mut self) -> Option<Statement<'b>> {
        let expression = self.parse_expression(Precedence::Lowest);
        if expression.is_none() {
            return None;
        }
        let expression = expression.unwrap();
        if self.peek_token_is(SEMICOLON) {
            self.next_token();
        }
        return Some(ExpressionStatement { expression });
    }

    fn parse_expression(&mut self, precedence: Precedence) -> Option<Expression<'b>> {
        let curr_type = &self.curr_token.as_ref().unwrap().token_type;
        let prefix_parser = self
            .prefix_parse_fns
            .get(curr_type)
            .unwrap()
            .as_ref();
        Some(prefix_parser(self))
    }

And the parse_identifier function is defined above underneath the new() definition.

I'm currently getting this error:

error[E0502]: cannot borrow `*self` as mutable because it is also borrowed as immutable
  --> parser/src/parser.rs:81:14
   |
76 |           let prefix_parser = self
   |  _____________________________-
77 | |             .prefix_parse_fns
78 | |             .get(curr_type)
   | |___________________________- immutable borrow occurs here
...
81 |           Some(prefix_parser(self))
   |                -------------^^^^^^
   |                |
   |                mutable borrow occurs here
   |                immutable borrow later used by call

How can I write this in a rust-y way? I'm following along with a book that's implemented this in Go, but I'm trying to capture the same behaviour in Rust.

Thanks in advance,
Arshan

jvcmarcenes · January 19, 2023, 10:34pm

is it necessary for you to be using dynamic dispatch in this case? usually with parsers you can define them in code, instead you're making a program that creates the parser at runtime.

are you following any algorithm for your parser or are you just winging it? it would be useful if you could explain the top level architecture of your parser and the motivations behind it's design (this could be an xy problem)

i've written parsers before and didn't have to deal with dynamic dispatching on parser functions. one of my parsers used generics, so it could be similar to your example, i think generics could be better than trait objects since they encode the logic in the type-level

H2CO3 · January 19, 2023, 10:34pm

If you need to store methods only, you could use function pointers instead of trait objects; those are copiable:

type ParseFn<'b, T> = fn(&mut T) -> Expression<'b>;

and then simply store ParseFn instead of Box<ParseFn> in the maps.

quinedot · January 19, 2023, 10:52pm

Since we're talking Fn here, you could use an Arc<ParseFn<'b, Self>> (or Rc) instead of Box, if function pointers aren't sufficient.

        self.prefix_parse_fns
            .get(&curr_type)
            .map(|arc_ref| arc_ref.clone())
            .map(|prefix_parser| prefix_parser(self))

If you can't copy or clone your closures, you'll have to remove it from the hash map so that you're not borrowing yourself (to hold on to the closure reference) while simultaneously passing a &mut self.

    pub fn foo(&mut self, curr_type: TokenType) -> Option<Expression<'b>> {
        self.prefix_parse_fns
            .remove(&curr_type)
            .map(|prefix_parser| {
                let res = prefix_parser(self);
                self.prefix_parse_fns.insert(curr_type, prefix_parser);
                res
            })
    }

But this is probably not workable as I imagine you have expressions which may have subexpressions for example (i.e. the prefix_parser needs prefix_parse_fns to be intact).

ArshanKhanifar · January 19, 2023, 11:27pm

It's supposed to be a pratt parser, however I'm too early in the book so implementation might not be fully fleshed out yet.

You're correct re: dynamic dispatching. I could just use a match statement and statically dispatch to different functions within my struct. I don't really know why the book is using this implementation. Maybe in the future he wants to change the parser logic on the fly? Idk...

I think you're correct in that I wouldn't really need dynamic dispatching though... I personally don't see why my parser would want to accept new behaviour at runtime... The book is written in Go and I've just decided to do this in Rust to learn Rust, and it's been amazing because it has me thinking about all these small decisions where in JS/Python/Java/Etc. I wouldn't even think about them...

But that actually makes me itch a bit... What if I do want to have an object whose behaviour is modifiable at runtime? Then what?

I'd want to still know what to do in that case...

Thank you for raising these questions. I'm learning a lot.

ArshanKhanifar · January 19, 2023, 11:28pm

oh interesting, will give this a try, thanks!

But what is exactly going wrong here when I use Box? I want to understand what ownership rules are getting violated here...

quinedot · January 20, 2023, 12:18am

You can use let prefix_parser: () = ... to have the compiler tell you the exact type here, but let me try to track it syntactically.

        let prefix_parser = self  // &'x mut Self = &'x mut Parser<'a, 'b>
            .prefix_parse_fns     // (&'y) HashMap<ToTy, Box<dyn Fn -> E<'b>>
            .get(curr_type)       // Option<&'y Box<dyn Fn -> E<'b>>
            .unwrap()             // &'y Box<dyn Fn -> E<'b>
            .as_ref();            // &'y dyn Fn -> E<'b>

As you can see, it's a reference -- a reference to something inside of self.prefix_parse_fns which is inside of self. Trying to use the self: &mut Self while this borrow is still active is going to cause a conflict.

Maybe this is my dyn Fn(Self) -> E<'b> for example:

|parser| { parser.prefix_parse_fns.clear(); }

That would cause a dangling pointer. More fundamentally though, having a &mut observable by another borrow is instance UB in Rust.

So prefix_parser and self can't be usable at the same time, and there's no way to pass self to the prefix_parser as a Fn.

The solutions all involve being able to stop borrowing once you find the closure in some way:

Copy it out instead of borrowing in place
Clone it out instead of borrowing in place
Move it out instead of borrowing in place (and then put it back)

CAD97 · January 20, 2023, 3:53am

The Go book probably uses a list of functions because the most common way to talk about Pratt parsing is with some Map<Token, (Precedence, Parser)>.

For a more Rust-friendly take on structuring Pratt parsing, I always like to reference Matklad's Simple but Powerful Pratt Parsing
Apr 13, 2020 and From Pratt to Dijkstra.

The short version is not to use an actual dynamic map, but to just write out the map as a match. Using a map is only necessary if you're interpreting some set of dynamic rules. Some people say that doing so better teaches the pure algorithm; I say the extra indirection gets in the way.

If you do want to make it more dynamic, the typical way to structure it for Rust would be to separate out the "language" definition (the maps) from the parser context. Then you have Language::parse_xxx(&self, &mut Parser) etc.

H2CO3 · January 20, 2023, 5:57am

It has nothing to do with box.

The problem is that borrowing an element of a container contains the entire container, and in general calling a borrowing (&self or &mut self) method borrows the entire receiver.

Thus, when you try to get the function out of the map and then call a mutating method on self, then borrowing the function borrows the map, but calling the mutating method would mutably borrow all of self, hence no other borrow into self is allowed.

When you use a function pointer, you can just copy it out of the map, without borrowing it for as long as you keep the function pointer itself around. (You couldn't do it with a trait object, or with a regular generic closure type that captured non-cloneable data.)

Likely because Go has no good/sophisticated pattern matching. In this case, Go's simplistic switch statement could be used IIUC, but it's probably not as widely-used as pattern matching in Rust, so the author likely reached for the dynamic dispatch alternative instinctively.

ArshanKhanifar · January 20, 2023, 5:50pm

Thank you so much for this beautiful explanation, I've read the rust book multiple times already and some concepts don't click until you actually start getting your hands dirty with implementing something. Greatly appreciated

ArshanKhanifar · January 20, 2023, 5:51pm

This is an awesome resource! Thank you so much for sharing. More content for me to nerd out on.

ArshanKhanifar · January 20, 2023, 5:53pm

Yeah makes sense, in this case just using fn solved my problem. However I think I'm going to use a match statement anyways.

system · April 20, 2023, 5:54pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
How to use a HashMap to store a struct method? help	3	606	September 28, 2021
Hashmap of String -> (Fn () -> ()) help	13	2713	January 12, 2023
Hashmap in a struct help	3	692	October 25, 2022
Integrate HashMap into struct with methods	4	868	October 4, 2021
Add struct methods to HashMap?	16	945	May 27, 2023

Use a HashMap to look up a struct's own mutable methods

Related Topics