Beginner questions to convert super tiny compiler from JS to Rust

Hi,

I'm still a beginner in learning Rust. I try to convert the super tiny compiler from JS to Rust. I try to follow the JS example as much as possible, but also be as Rust idiomatic as possible. I currently implemented the tokenizer and the parser. However I'm very unsure, if my Rust code is "good". My ported Rust code lives here.

I have several questions and I'd be very happy to get some feedback.

  1. Is c if c.is_whitespace() => (), the correct way to call methods inside match? (source)
  2. I want to peek into the next char and if it fits my needs, I want to consume it. For this I currently need to call next (here) with Some() even when I know through peek (here) that my value is correct. Is there a way to get rid if the second Some()?
  3. In my parser function I try to map a vector of tokens to an AST. However I have a hard time to get the values out of my tokens which are modelled as an enum. I use &Token::Number(ref value @ _) => Ok(Ast::NumberLiteral(value.to_string())), (here) to get my value. &, ref value @ _ and value.to_string() seem to be super hacky. Is there a better way to do this?
  4. I mocked tokens in my test (here), but they are moved here. Can I re-use the same tokens somehow, so I don't need to make them again (here)?

Thank you very much! :heart:

It is correct, but there is nothing done inside this match. The if c.is_whitespace() part is called a guard; c matches anything, but this arm only matches if the method is_whitespace on that anything returns true. The () after the match arrow is not a function call but an empty tuple, signifying that this arm does nothing and returns nothing.

2 Likes

Hi, and welcome to Rust!

  1. Yes!
  2. This is a case where using unwrap is fine. You can also simplify the match to a while let. So:
    while let Some(&'0'...'9') = char_iter.peek() {
        value.push(char_iter.next().unwrap());
    }

.3. You can just omit the @ _ part. And the match is usually written as

match *token {
    Token::Number(...) => ...
}

The to_string() you have to use because you're walking through the tokens as borrowed references. You might be able to switch to going through owned tokens, using a consuming iterator like into_iter(), but I haven't fully checked it. Then you can destructure the tokens in the match and reuse the inner value.

.4. Just clone() them. For tests that's the easiest solution.
In this special case, you could also do the assert as assert_eq!(Ok(&tokens), tokenizer(input).as_ref()) to avoid moving.

1 Like

Thank you for the feedback! The code basically did what it should: ignore all the whitespace. The () as an empty tuple was written on purpose, but now I understand that my intent wasn't very clear. I changed this line to c if c.is_whitespace() => continue now. This is what I really tried to express. Thank you.

Thank you, this is very valuable feedback for me.

  1. Thanks.
  2. I still have a hard time to figure out when it is "good" to use unwrap outside of example code. Are there any guide lines on this? Can this case be simplified in a similar way? It is more a negation of while let Some() = char_iter.peek() and has a second case (None).
  3. Thank you. Exactly the feedback I was looking for (e.g. is it more idiomatic to write match *token { Token::Number(...) => ... } or to write match token { &Token::Number(...) => ... }). I'll look into into_iter(), too.
  4. Thanks. I changed that now. Token derives Clone and I use it like this in my tests: assert_eq!(Ok(tokens.clone()), tokenizer(input));.

To 2.) Is there any way that unwrap could be unsafe here? Or do I know it is safe by common sense?

unwrap is fine when you "know" it's infallible - either you've verified this in code already or have some other reason to believe that to be the case. Just need to be careful when you refactor code such that this knowledge may no longer hold :slight_smile:.

Thanks!

You don't need common sense; since you don't need unsafe to invoke it, you know it's safe.

3.I switched my code to into_iter in this commit. It seems to work fine. I think it is easier to read now. Are there any "technical" disadvantages (or advantages) with into_iter vs. the old code? Thank you again.

Disadvantage: with the old code, a user could call the function using any slice of data. With the new version, they can only do it with data that is held in a Vec. They can't use it on a subset of a file to parse only a little bit at a time, unless they copy that subset out into a new Vec.

The difference is that with into_iter, the tree is deconstructed (which is why you can reuse the strings). So if you needed to go over it more than once, it wouldn't work.

1 Like

There are two questions here: can it be unsafe by Rust's definition? No. Can it panic? Also no, because you've made sure of that before. But panicking is not unsafe in any case.

1 Like

Cool, thank you. Yeah, I actually meant "Can it panic?" and not "Is it unsafe?". Sorry for the confusion.

1 Like