Code review my first Rust project: A Ruby-to-Javascript Transpiler

A few months ago I was eager to learn Rust. After going through the Rust book, I decided to follow Gary Bernhardt's "A Compiler from Scratch" video and build the Ruby to Javascript compiler using Rust instead of Ruby. This was quite a challenge, it wound up taking me several 8 hours to get through that 45 minute video. I had a few detours as certain technical choices (i.e. mutating arrays) weren't very rusty, so I had to figure out the "right" way to do things. It was a pretty satisfying experience, and I'm looking to dig into Rust again.

For now, I would love it if some more experienced Rustaceans could give my toy transpiler a code review! The repo is located here:

2 Likes

Good job. This is a well structured and easy to review project.

First of all, clippy will help you catch some basic style issues. Install it with rustup component add clippy and use with cargo clippy. It would also be good to use rustfmt on the code to ensure the formatting recommended by the community.

The lack of tests is disturbing. Rust makes writing tests and even practicing test-driven development quite easy, so make sure you're writing tests whenever you can. It's unlikely someone will want to use a crate without any tests.

The main function lacks the ability to run the transpiler on arbitrary code. It would be nice if it just read the input code from stdin and printed the JS output to stdout so that it can easily be piped (think echo 1.rb | transpile > 1.js).

Error handling has a lot of room for improvement. Panicking may be fine in your case, but using Result will allow you to create helpful error messages with more information. The failure crate, as always, is very useful for error handling.

You can get rid of a few vectors in the generator module by using join from itertools. Unlike the std version, it can work on arbitrary iterators, so you don't need to collect a vector before joining the parts.

If you're up to a challenge, you can try to make the generator module allocation free. Instead of returning a String, you can pass an impl Write and write the output directly into the supplied buffer. Instead of populating vectors (e.g. in generate_def), you can write everything directly into the buffer. The tricky part here is join(...): the join from std only works on slices and can't work on arbitrary iterators. Using peekable will allow you to check if you're at the last item but it's not a very nice code. I think something along the lines of intersperse from itertools can be of use but it has its own complications. So I'm not sure it will be worth it. Like I said, it's just a challenge you can try.

The Display implementations in the node module don't seem too idiomatic. Display should be used for user-facing output, and your implementation looks like something Debug would be more suitable for. I'm not sure why you need both the auto derived Debug implementation and a custom one.

parse_def and consume functions don't need to be public.

consume function can be easily made to return Result<Token<'code>, &'code str> by replacing the last line with Err(&next_token.kind.name). It would also be nice to create a wrapper error type for it because now it's unclear what is returned in the Err case. When you return a string error, the reader will usually assume you're using "stringly typed errors" and the string contains the full error message. This is not the case here. An explicit error type would be more understandable:

struct UnexpectedInput<'code>(&'code str);

impl<'code> fmt::Debug for UnexpectedInput<'code> {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(f, "unexpected input: {}", self.0)
    }
}

fn consume<'code>(tokens: &mut Vec<Token<'code>>, kind: &str) -> Result<Token<'code>, UnexpectedInput<'code>>  {
    let next_token = tokens.remove(0);

    if next_token.kind.name == kind {
        return Ok(next_token);
    }

    Err(UnexpectedInput(&next_token.kind.name))
}

Passing kind as a string to consume is really bad. If you make a typo like consume(tokens, "idenfifier"), the compiler won't catch it. It would be much better if a enum was used here instead (e.g. consume(tokens, TokenKind::Identifier)). Checking for enum equality is also much cheaper than string comparison.

index_is is missing a bounds check so it's easy to cause a panic (let code = "def f(x, y, z)";). You may want to make a cleaner error message here.

tokenize function is recursive, so it will overflow the stack if the input is long enough. You should rewrite it as a loop.

get_next_token simply skips a char if no regexp matches the input, so let code = "def ? f(x, y, z) g(name, 2) end" succeeds. It should produce an error instead.

code[1..] will panic if your input is multi-byte (try Ф) because you're slicing not at a char boundary. I don't understand what you're trying to do here. If you're skipping whitespace, you can just make a token for it with a regexp that matches whitespace, then discard these tokens.

In get_next_token, you needlessly match regexp twice with is_match and captures. You can just write if let Some(cap) = kind.re.captures(&code) { instead.

In get_next_token, String::from does unnecessary allocation. It can be removed without changing any code.

The indexing magic in get_next_token can be reduced to this one liner: let (value, code) = code.split_at(cap[0].len());.

Another interesting challenge is to rewrite tokenizer as an iterator. You access the tokens mostly sequentially, so a tokenizer iterator wrapped in a peekable would probably work as an input to parser.

2 Likes

Thank you so much! I'm going to get started on some of these things. I ran fmt and clippy, and might look at add tests and reading from stdin next. :smile: