Good job. This is a well structured and easy to review project.
First of all, clippy will help you catch some basic style issues. Install it with rustup component add clippy
and use with cargo clippy
. It would also be good to use rustfmt
on the code to ensure the formatting recommended by the community.
The lack of tests is disturbing. Rust makes writing tests and even practicing test-driven development quite easy, so make sure you're writing tests whenever you can. It's unlikely someone will want to use a crate without any tests.
The main function lacks the ability to run the transpiler on arbitrary code. It would be nice if it just read the input code from stdin and printed the JS output to stdout so that it can easily be piped (think echo 1.rb | transpile > 1.js
).
Error handling has a lot of room for improvement. Panicking may be fine in your case, but using Result
will allow you to create helpful error messages with more information. The failure
crate, as always, is very useful for error handling.
You can get rid of a few vectors in the generator
module by using join from itertools. Unlike the std version, it can work on arbitrary iterators, so you don't need to collect a vector before joining the parts.
If you're up to a challenge, you can try to make the generator module
allocation free. Instead of returning a String
, you can pass an impl Write
and write the output directly into the supplied buffer. Instead of populating vectors (e.g. in generate_def
), you can write everything directly into the buffer. The tricky part here is join(...)
: the join
from std
only works on slices and can't work on arbitrary iterators. Using peekable will allow you to check if you're at the last item but it's not a very nice code. I think something along the lines of intersperse from itertools
can be of use but it has its own complications. So I'm not sure it will be worth it. Like I said, it's just a challenge you can try.
The Display
implementations in the node
module don't seem too idiomatic. Display
should be used for user-facing output, and your implementation looks like something Debug
would be more suitable for. I'm not sure why you need both the auto derived Debug
implementation and a custom one.
parse_def
and consume
functions don't need to be public.
consume
function can be easily made to return Result<Token<'code>, &'code str>
by replacing the last line with Err(&next_token.kind.name)
. It would also be nice to create a wrapper error type for it because now it's unclear what is returned in the Err
case. When you return a string error, the reader will usually assume you're using "stringly typed errors" and the string contains the full error message. This is not the case here. An explicit error type would be more understandable:
struct UnexpectedInput<'code>(&'code str);
impl<'code> fmt::Debug for UnexpectedInput<'code> {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "unexpected input: {}", self.0)
}
}
fn consume<'code>(tokens: &mut Vec<Token<'code>>, kind: &str) -> Result<Token<'code>, UnexpectedInput<'code>> {
let next_token = tokens.remove(0);
if next_token.kind.name == kind {
return Ok(next_token);
}
Err(UnexpectedInput(&next_token.kind.name))
}
Passing kind
as a string to consume
is really bad. If you make a typo like consume(tokens, "idenfifier")
, the compiler won't catch it. It would be much better if a enum was used here instead (e.g. consume(tokens, TokenKind::Identifier)
). Checking for enum equality is also much cheaper than string comparison.
index_is
is missing a bounds check so it's easy to cause a panic (let code = "def f(x, y, z)";
). You may want to make a cleaner error message here.
tokenize
function is recursive, so it will overflow the stack if the input is long enough. You should rewrite it as a loop.
get_next_token
simply skips a char if no regexp matches the input, so let code = "def ? f(x, y, z) g(name, 2) end"
succeeds. It should produce an error instead.
code[1..]
will panic if your input is multi-byte (try Ф
) because you're slicing not at a char boundary. I don't understand what you're trying to do here. If you're skipping whitespace, you can just make a token for it with a regexp that matches whitespace, then discard these tokens.
In get_next_token
, you needlessly match regexp twice with is_match
and captures
. You can just write if let Some(cap) = kind.re.captures(&code) {
instead.
In get_next_token
, String::from
does unnecessary allocation. It can be removed without changing any code.
The indexing magic in get_next_token
can be reduced to this one liner: let (value, code) = code.split_at(cap[0].len());
.
Another interesting challenge is to rewrite tokenizer as an iterator. You access the tokens mostly sequentially, so a tokenizer iterator wrapped in a peekable
would probably work as an input to parser
.