Request for Code Review

I am now on my second day of learning rust. For practicing I implemented a simple calculator/parser which uses the shunting-yard algorithm.

I would be happy for some feedback on my code:

Thanks a lot!

I created a playground with a few ideas. Looks really cool, I think parsing is such an interesting problem to solve.

// for &Vec<T>, &String and there's one more I'm forgetting use
// &[T], &str and the one I forgot, maybe never &Box<T> cus its just &T
fn tokenize(content: &str) -> Result<Vec<Token>, &'static str>
// did you try this
fn tokenize<'a>(content: &'a str) -> Result<Vec<Token>, &'a str>
1 Like

Observations in no particular order of importance:

  • Lexer:
    • Taking &String or any explicitly owned container by reference, e.g. &Vec (in the tokenize function and others), has no value, because you can't mutate it (append to it), and if someone has e.g. a string literal of type &str, they'll be forced to make an allocation just to pass a &String. So, just use &str as incoming parameter types instead. If you need ownership of the data, pass a String (or another owned container) by value.
    • It is not necessary (and non-idiomatic) to explicitly return at the end of a function. Rust is an expression language, so the last expression of a block will be its value, including the case when a block is the body of a function.
    • All those v.push(Token::new(&buffer).unwrap()); lines look strange. Why don't you handle error conditions gracefully when the Token impl, which you wrote yourself, returns a Result?
    • enum variant names are conventionally spelled using CamelCase, not in FULLCAPS.
    • in Token::is_operator(), the huge comparison chain OR'd together could be written more cleanly as a match:
      use TokenType::*;
      
      match self.get_type() {
          Divide | Multiply | Plus | Minus | Power => true,
          _ => false,
      }
      
    • in Token::new(), you are recompiling tokenizer regular expressions for each call. This is superfluously expensive and probably slows your lexer down a couple dozen or maybe even hundreds of times. Consider using the regex!() macro or lazy_static.
    • get_XXX() functions are unidiomatic in Rust. Explicit get naming is used for collection lookup (e.g. HashMap), not for general "property" "getters". The latter are usually just named after the corresponding property itself, e.g. value(), precedence(), associativity(). For type, since it is a keyword, consider using a different name, e.g. kind or ty.
    • in calculator::evaluate(), this match:
      match tokenize(expression) {
          Ok(v) => match shunting_yard(&v) {
              Ok(s) => calculate(&s),
              Err(x) => Err(x),
          },
          Err(e) => Err(e),
      }
      
      could be vastly simplified to:
      calculate(&shunting_yard(&tokenize(expression)?)?)
      
      or if you prefer to spell it out:
      let v = tokenize(expression)?;
      let s = shunting_yard(&v)?;
      calculate(&s)
      
    • in calculator::calculate(), it is unnecessary to call iter() on tokens. Most reference-to-container types implement IntoIterator<Item=&T> where T would be the (owned) item type of the container itself.
    • in the same function, all those stack.last().unwrap().get_precedence() could use some refactoring by means of pulling the expression out into a variable and reusing it, instead of repeating the same huge expression twice.
  • Main:
    • if args.len() == 1usize looks strange; type inference should be able to compute the type of the literal 1, so just write if args.len() == 1.
    • The usual style for placing block-delimiting braces in Rust is to put the opening brace on the end of the line instead of the beginning of the following line.
6 Likes

Thanks a lot. I already fixed those points:

  • removed return statement at end of function
  • renamed enum members to CamelCase
  • refactor is_operator
  • renamed the getters
  • refactor evaluate method. After looking through docs it makes sense, because I am just propagating errors upwards
  • yep, my bad still getting used to this (iterators and such)
  • removed usize suffix
  • formatted all code

Things that need to be done but take time:

  • replace String parameters with slices

The things I still have questions about:

  • How could I possible improve on v.push(Token::new(&buffer).unwrap());? Are you suggesting to also use a ? operator?
  • As of now the regex library seemed to have removed this macro: git. I will try with lazy static
  • I feel like the alternatives to stack.last().unwrap().get_precedence() would make it even bulkier. Not sure on how to do this nicely, need to experiment.

Thanks a lot. What I gained from your proposal is the idea to remove the explicit value property from the token - I will try and do that.

You might be interested in clippy.

2 Likes

Installed it right away, pointed out some intersting things already, thanks!

Well, I don't know – to be honest I don't really understand the thoughts behind that whole part of the design. The function returning Result suggests that it can fail, yet its return value is always unwrap()ped, suggesting that it can't/shouldn't. Probably what you meant was an assertion? Or the function can really fail under normal circumstances, and you should be handling the error gracefully?

Hm okay, that macro was nightly-only anyway. lazy_static should get you a comparable performance boost indeed.

Instead of checking for is_empty() and then .last().unwrap(), you could just check if .last() returns Some or None. For instance, you could match on it or use if let Some(top) = stack.last().

1 Like

Another suggestion is using the rustfmt addon to help your code converge to style guidelines.

1 Like