How to downcast a pointer without having the underlying struct be static?

I'm writing a parser for a programming language, I have the following definitions:

pub trait Node {
    fn token_literal(&self) -> &str;
}

pub trait Statement: Node {
    fn statement_node(&self);
}

pub trait Expression: Node {
    fn expression_node(&self);
}

pub struct Identifier<'a> {
    pub token: Token<'a>,
    pub value: &'a str,
}

impl<'a> Node for Identifier<'a> {
    fn token_literal(&self) -> &'a str {
        self.token.literal
    }
}

impl<'a> Expression for Identifier<'a> {
    fn expression_node(&self) {}
}

pub struct LetStatement<'a> {
    pub token: Token<'a>,
    pub name: Identifier<'a>,
    // TODO: make it not optional
    pub value: Option<Box<dyn Expression + 'a>>,
}

impl<'a> Node for LetStatement<'a> {
    fn token_literal(&self) -> &str {
        self.token.literal.as_ref()
    }
}

impl Statement for LetStatement<'_> {
    fn statement_node(&self) {}
}

pub struct Program<'a> {
    pub statements: Vec<Box<dyn Statement + 'a>>,
}

The definition of Token is:

pub struct Token<'a> {
    pub token_type: TokenType,
    pub literal: &'a str,
}

Now I have a parser that basically gets an instance of a Lexer and parses statements as it goes:

struct Parser<'a, 'b: 'a> {
    l: &'a mut Lexer<'b>,
    errors: Vec<String>,
    curr_token: Option<Token<'b>>,
    peek_token: Option<Token<'b>>,
}

impl<'a, 'b: 'a> Parser<'a, 'b> {
    fn new(l: &'a mut Lexer<'b>) -> Parser<'a, 'b> {
        let curr_token = Some(l.next_token());
        let peek_token = Some(l.next_token());
        Parser {
            l,
            errors: Vec::new(),
            curr_token,
            peek_token,
        }
    }

    fn next_token(&mut self) {
        self.curr_token = self.peek_token.take();
        self.peek_token = Some(self.l.next_token());
    }

    fn parse_statement(&mut self) -> Option<Box<dyn Statement + 'b>> {
        match self.curr_token.as_ref().unwrap().token_type {
            TokenType::LET => self.parse_let_statement(),
            _ => None,
        }
    }

    pub fn errors(&self) -> &Vec<String> {
        &self.errors
    }

    fn peek_error(&mut self, t: TokenType) {
        let msg = format!(
            "expected next token to be {}, got {} instead",
            t.to_str(),
            self.peek_token.as_ref().unwrap().token_type.to_str()
        );
        self.errors.push(msg);
    }

    fn parse_let_statement(&mut self) -> Option<Box<dyn Statement + 'b>> {
        let statement_token = self.curr_token.take().unwrap();
        if !self.expect_peek(TokenType::IDENT) {
            return None;
        }
        let id_token = self.curr_token.take().unwrap();
        let value = id_token.literal;
        let name = Identifier {
            token: id_token,
            value,
        };
        if !self.expect_peek(TokenType::ASSIGN) {
            return None;
        }
        // TODO: we're skipping expression parsing
        while !self.curr_token_is(TokenType::SEMICOLON) {
            self.next_token();
        }
        return Some(Box::new(LetStatement {
            token: statement_token,
            name: name,
            value: None,
        }));
    }

    fn curr_token_is(&self, t: TokenType) -> bool {
        self.curr_token.as_ref().unwrap().token_type == t
    }

    fn peek_token_is(&self, t: TokenType) -> bool {
        self.peek_token.as_ref().unwrap().token_type == t
    }

    fn expect_peek(&mut self, t: TokenType) -> bool {
        if self.peek_token_is(t) {
            self.next_token();
            true
        } else {
            false
        }
    }

    fn parse_program(&mut self) -> Program<'b> {
        let mut program = Program {
            statements: Vec::new(),
        };
        while !self.curr_token_is(TokenType::EOF) {
            if let Some(statement) = self.parse_statement() {
                program.statements.push(statement);
            }
            self.next_token();
        }
        program
    }
}

And I'm testing it as follows:

#[cfg(test)]
mod tests {
    use std::any::Any;
    use std::mem::transmute;
    use std::os::macos::raw::stat;

    use monkey_ast::ast::{LetStatement, Node};

    use super::*;

    #[test]
    fn test_let_statements() {
        let input = "
let x = 5;
let y = 10;
let foobar = 838383;";
        let mut l = Lexer::new(input);
        let mut p = Parser::new(&mut l);
        let program = p.parse_program();
        check_parser_errors(&p);

        if program.statements.len() != 3 {
            panic!(
                "Program does not contain 3 statements, got: {}",
                program.statements.len()
            );
        }
        let tests = ["x", "y", "foobar"];
        for (i, expected_identifier) in tests.iter().enumerate() {
            let statement = program.statements.get(i);
            assert!(statement.is_some());
            let statement = statement.unwrap();
            let statement = statement.as_ref();
            let statement = (&statement as &dyn Any).downcast_ref::<LetStatement>();
            assert!(statement.is_some());
            let statement = statement.unwrap();
            assert_eq!(statement.name.value, *expected_identifier);
            assert_eq!(statement.name.token_literal(), *expected_identifier);
        }
    }

    fn check_parser_errors(p: &Parser) {
        let errors = p.errors();
        assert_eq!(
            errors.len(),
            0,
            "had parsing errors: {}",
            errors.get(0).unwrap()
        )
    }
}

But I'm getting this error:

error[E0597]: `program.statements` does not live long enough
   --> parser/src/parser.rs:134:29
    |
134 |             let statement = program.statements.get(i);
    |                             ^^^^^^^^^^^^^^^^^^^^^^^^^ borrowed value does not live long enough
...
139 |             let statement = (&statement as &dyn Any).downcast_ref::<LetStatement>();
    |                              ---------- cast requires that `program.statements` is borrowed for `'static`
...
145 |     }
    |     - `program.statements` dropped here while still borrowed

Thinking in OOP terms, program.statements will hold instances of the superclass Statement, and in this specific test I know that the only statements being returned are in fact LetStatements so I want to downcast it and make further assertions.

Is this not possible in rust? I know that the compiler's telling me to make program static, but I don't even know how to do that. Plus isn't that bad practice? Instances of Statements don't have to persist during the entire program...

You are confusing the two different meanings of static. The compiler doesn't tell you that your program variables must live for the static lifetime. That's not what a : 'static bound means. A lifetime bound T: 'a means that the type T must be valid for creating a reference of type &'a T, ie., it must not hold references valid for shorter than 'a. In practice, this most often means that a 'static type must be an owning type that doesn't contain any references.

Thus, the compiler doesn't tell you to replace your locals with static globals. The compiler tells you to supply a type that doesn't have references in it. Why is that? This is because downcasting is practically impossible to implement for types with a non-trivial lifetime annotation. Lifetimes are completely erased at compile time, and they would be too hard/slow/generally impractical to recover at runtime. This means that the compiler must simply ignore them. However, this would instantly make any sort of downcasting trivially and egregiously unsound, as you could use it to completely circumvent lifetime checking. Hence, anything with short-living references inside is not allowed to be up-/downcasted to/from dyn Any.

Here we go. This is your bigger problem. Rust is not an object-oriented language. Rust types aren't classes, "subtraits" aren't subtypes, and there is no inheritance. If you try to use OO patterns in Rust, you'll only suffer.

Don't try to model your AST as an open set of "subclasses". It's a daft thing to do even in an OO language, because a language with a well-defined grammar has a known set of AST nodes, so opening them up to subclassing only causes unexpected corner cases.

Luckily, Rust has exactly the right tool for this job. Just use an enum instead! You can look at any mature and idiomatic parser library (you don't even have to go further than plain old JSON), and you will see that their AST representations are almost exclusively enums.

Also consider not writing a parser by hand.

4 Likes

Thank you so so so so much, your answer simply couldn't be any more complete.

To give a bit of a context, I'm following along with this book: Writing an Interpreter in Go. I wanted to learn about compilers/interpreters and also wanted to learn Rust, so I felt like following along with this book and implementing everything in Rust would be a good way to learn both.

Really appreciate the explanation regarding the low-level implementation of down-casting.

I'll model my AST as enums instead, that's a great suggestion.

Thank you for recommending Parsel, I'll give it a go once I'm done manually writing this parser by hand. I'm doing this as an exercise for myself.

Once again thank you so much, it's 2am here and I'm gonna go sleep in peace.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.