Matching on regex's help

I'm currently following along with this tutorial on writing a tiny compiler from scratch: Super Tiny Compiler, which I found via an old blog post from Johnathan Turner, Programming Language and Compilers Reading List

Right now I'm pretty much at the beginning, the lexer portion. I was wondering if someone could point out a way where I can discard all the if statements and use a match statement instead. I believe it will require me to use the regex crate, because I'd have to match on things more complex than a single character (for instance a long number, or white space). Thing is, that will require several different regex's to be matched, and I don't know how to do that in a match statement.

So far this is what I have with just IFs:

fn main() {
    println!("Hello, world!");

    tokenizer("This is a test()");
}

const PAREN: &'static str = "paren";
const NUMBER: &'static str = "number";

struct Token<'a> {
    ttype: &'a str,
    value: char
}

/// Take a string of code and tokenize it
/// (add 2 (subtract 4 2)) => [ { type: 'paren', value: '('}, ...]
fn tokenizer(input: &str) {
    let mut current = 0; // current location of cursor

    let mut tokens: Vec<Token> = Vec::new(); // array for tokens

    let input_vec: Vec<char> = input.chars().collect();

    while current < input.len() {
        let mut cur_char: char = input_vec[current];

        if cur_char == '(' {
            tokens.push(Token{ ttype: PAREN, value: cur_char});
            current = current + 1;
            continue;
        }


        if cur_char == ')' {
            tokens.push(Token{ ttype: PAREN, value: cur_char});
            current = current + 1;
            continue;
        }

        if cur_char.is_whitespace() {
            current = current + 1;
            continue;
        }

        if cur_char.is_numeric() {
            let mut num_value: String = String::new();

            while cur_char.is_numeric() {
                num_value.push(cur_char);
                current = current + 1;
                cur_char = input_vec[current];
            }

            tokens.push(Token { ttype: NUMBER, value: num_value});
            contine;
        }

    }

What I'd like to do is get rid of all the:

if cur_char == whatever

and replace them with:

match cur_char {
  '(' => do something,
  ')' => do something,
  whitespace => do something,
  Number => do something
}
1 Like

There can be an if on each arm:

match {
   x if x.is_whitespace() => {}
}

There can be multiple patterns too:

match {
   '(' | ')' if paren_mode_on => {}
}
2 Likes

Awesome, thanks @kornel.

1 Like