Managing memory is kicking my butt


#1

I’m having a heck of a time figuring out how to use references properly. Basically my problem is this:

I have a Vector of a custom enum type that I want to pass to another function and use. I don’t need to mutate this vector, only read the contents. However, after I’ve passed it to another function as a borrow, all hell breaks lose, and I can’t fight my way out of it.

Below is my current code:

use super::lexer::Token;

enum ASTNodeType {
    NumberLiteral,
    StringLiteral,
    CallExpression
}

struct Node {
    node_type: ASTNodeType,
    name: String,
}

struct BaseNode {
    node_type: ASTNodeType,
    name: String,
    params: Vec<Node>
}

enum ASTNode {
    Node(Node),
    BaseNode(BaseNode)
}

pub fn parser(tokens: Vec<Token>) {
    let mut current: usize = 0;

    walk(tokens.as_slice(), &mut current);
}

fn walk(tokens: &[Token], current: &mut usize) -> Option<ASTNode> {
    let mut token: &Token = &tokens[*current];

    match token {
        Token::Number(value) => {
            *current += 1;
            Some(ASTNode::Node(Node { node_type: ASTNodeType::NumberLiteral, name: value.clone() }))
        }
        Token::String(value) => {
            *current += 1;
            Some(ASTNode::Node(Node { node_type: ASTNodeType::StringLiteral, name: value }))
        }
        Token::Paren('(') => {
            *current += 1;
            token = &tokens[*current];

            let mut node: BaseNode = BaseNode {
                node_type: ASTNodeType::CallExpression,
                name: token_value(&token),
                params: Vec::new()
            };

            *current += 1;

            token = &tokens[*current];

            while token_value(&token) != ")" {
                match walk(&tokens, current).unwrap() {
                    ASTNode::Node(val) => node.params.push(val),
                    _ => ()
                };

                token = &tokens[*current];
            }

            *current += 1;

            return Some(ASTNode::BaseNode(node));
        }
        Token::Paren(_) => { None }
        Token::Name(_) => { None }
    }
}

fn token_value(token: &Token) -> String {
    match *token {
        Token::Number(ref value) => value.clone(),
        Token::String(ref value) => value.clone(),
        Token::Paren(ref value) => {
            let mut str_val = String::new();
            str_val.push(*value);
            str_val
        }
        Token::Name(ref value) => value.clone()
    }
}

For reference here is the Token enum:

#[derive(Debug)]
pub enum Token {
    Paren(char),
    String(String),
    Number(String),
    Name(String)
}

And here is the output of cargo test:

cargo test
   Compiling lang v0.1.0 (file:///Users/jeramy/dev/rust/lang)
error[E0308]: mismatched types
  --> src/parser.rs:35:9
   |
35 |         Token::Number(value) => {
   |         ^^^^^^^^^^^^^^^^^^^^ expected reference, found enum `lexer::Token`
   |
   = note: expected type `&lexer::Token`
              found type `lexer::Token`

error[E0308]: mismatched types
  --> src/parser.rs:39:9
   |
39 |         Token::String(value) => {
   |         ^^^^^^^^^^^^^^^^^^^^ expected reference, found enum `lexer::Token`
   |
   = note: expected type `&lexer::Token`
              found type `lexer::Token`

error[E0308]: mismatched types
  --> src/parser.rs:43:9
   |
43 |         Token::Paren('(') => {
   |         ^^^^^^^^^^^^^^^^^ expected reference, found enum `lexer::Token`
   |
   = note: expected type `&lexer::Token`
              found type `lexer::Token`

error[E0308]: mismatched types
  --> src/parser.rs:70:9
   |
70 |         Token::Paren(_) => { None }
   |         ^^^^^^^^^^^^^^^ expected reference, found enum `lexer::Token`
   |
   = note: expected type `&lexer::Token`
              found type `lexer::Token`

error[E0308]: mismatched types
  --> src/parser.rs:71:9
   |
71 |         Token::Name(_) => { None }
   |         ^^^^^^^^^^^^^^ expected reference, found enum `lexer::Token`
   |
   = note: expected type `&lexer::Token`
              found type `lexer::Token`

error: aborting due to 5 previous errors

error: Could not compile `lang`.

I’ve tried several different things and have ended with:

fn walk(tokens: &[Token], current: &mut usize) -> Option<ASTNode> {
    let mut token: &Token = &tokens[*current];

    match *token {
        Token::Number(ref value) => {
            *current += 1;
            Some(ASTNode::Node(Node { node_type: ASTNodeType::NumberLiteral, name: value.clone() }))
        }
        Token::String(ref value) => {
            *current += 1;
            Some(ASTNode::Node(Node { node_type: ASTNodeType::StringLiteral, name: value.clone() }))
        }
        Token::Paren('(') => {
            *current += 1;
            token = &tokens[*current];

            let mut node: BaseNode = BaseNode {
                node_type: ASTNodeType::CallExpression,
                name: token_value(&token),
                params: Vec::new()
            };

            *current += 1;

            token = &tokens[*current];

            while token_value(&token) != ")" {
                match walk(&tokens, current).unwrap() {
                    ASTNode::Node(val) => node.params.push(val),
                    _ => ()
                };

                token = &tokens[*current];
            }

            *current += 1;

            return Some(ASTNode::BaseNode(node));
        }
        Token::Paren(_) => { None }
        Token::Name(_) => { None }
    }
}

It compiles, but it looks really dirty. Can anyone tell me what I’m doing wrong here? Is there a better way I could have done this without having to derefernce the token and current variables?

Also the Token(ref val) seemed a bit weird as well.


#2

It seems like the basic problem here is that you don’t want to clone Token values in your walk function? If you just derive Clone for Token, then all your problems with ergonomics go away:

#[derive(Debug, Clone)]
pub enum Token {
    Paren(char),
    String(String),
    Number(String),
    Name(String)
}

fn walk(tokens: &[Token], current: &mut usize) -> Option<ASTNode> {
    let mut token: Token = tokens[*current].clone();

    match token {
        Token::Number(value) => {
            *current += 1;
            Some(ASTNode::Node(Node { node_type: ASTNodeType::NumberLiteral, name: value }))
        }
        Token::String(value) => {
            *current += 1;
            Some(ASTNode::Node(Node { node_type: ASTNodeType::StringLiteral, name: value }))
        }
        Token::Paren('(') => {
            *current += 1;
            token = tokens[*current].clone();

            let mut node: BaseNode = BaseNode {
                node_type: ASTNodeType::CallExpression,
                name: token_value(&token),
                params: Vec::new()
            };

            *current += 1;

            token = tokens[*current].clone();

            while token_value(&token) != ")" {
                match walk(&tokens, current).unwrap() {
                    ASTNode::Node(val) => node.params.push(val),
                    _ => ()
                };

                token = tokens[*current].clone();
            }

            *current += 1;

            return Some(ASTNode::BaseNode(node));
        }
        Token::Paren(_) => { None }
        Token::Name(_) => { None }
    }
}

The cost is that you are now cloning each Token as you walk the vector. If that cost is unacceptable, then what you arrived at with ref value is pretty close to what you want for this piece of code. An iterator will have similar problems since you cannot move from the slice during iteration; but you can clone the elements.

Another option to consider, if it is acceptable for your API, is giving ownership over the vector to walk, and allowing the iteration to consume each element. This could also take the shape of cloning the entire vector, and giving ownership over the clone.

You have a lot of options at your disposal for handling memory management in Rust. The tradeoff will generally come down to ergonomics vs runtime performance. I’ve found what works best for me is erring on the side of ergonomics, especially early in a project’s lifecycle. I can always benchmark and optimize later, after the code has become useful.


#3

Believe it or not,

match *expr {
    Pattern(ref binding) => { ... }
    ...
}

is wildly common in idiomatic Rust and is—or was—generally the recommended way of writing matches against borrows of non-copy types.

The Rust of the future will look different, though! There is an accepted RFC to add sugary alternatives, which has very recently been implemented! You can try it out on nightly.

#![feature(match_default_bindings)]

fn main() {
    let x = &(vec![2, 3], 6);
    
    match x {
        // Because we're trying to match an expression of type &_
        // with a pattern that isn't &_, this pattern approximately
        // desugars to &(ref a, ref b)
        (a, b) => {
            // just documenting what types a and b have
            let _: &Vec<_> = a;
            let _: &i32 = b;
        }
    }
    
    // what we would have to write today to achieve the same
    match *x {
        (ref a, ref b) => {
            // just documenting what types a and b have
            let _: &Vec<_> = a;
            let _: &i32 = b;
        },
    }
}

It’s only for match though and not let bindings. I forget why…

Update: It will be for all bindings according to the RFC. Also, @tschottdorf deserves quite a hand for putting it all together!