R interpreter written in Rust

Hi everyone,

I'm wondering if there would be any interest in developing an R interpreter in Rust ?

I personally don't have the skills to build one from the ground up, and definitely not by myself, but I like R and want to see it improved, considering the competition from the likes of Julia, for example.

The current source is hosted on a Subversion (...?) repository, which IMO limits the pool of contributors significantly, in addition to much of the current C source being unreadable (also IMO). This surely is a major roadblock when it comes to R's chances at remaining competitive.

I think Rust would be a great language to implement an R interpreter in (along the lines of RustPython), or at least include support for re-writing some of the C functions in Rust.

Just curious on people's thoughts :slight_smile:

Cheers!

R has 99 problems and being written in C ain't one. Python was becoming wildly popular even when CPython was its only major implementation.

Both R and Python have the serious social problems of not being designed for large-scale software engineering (dynamic typing, bad to mediocre package management story, etc.), and that most users aren't proper software engineers, who then go on writing brittle or outright incorrect code.

However, Python recognized pretty early that discipline is a good thing and started taking steps in that direction. PEPs for coding style, optional type annotations, language and library constructs that help you structure code and solve real-world problems (context managers, generators, process pools, …). It also aims to be a general language, so it provides features that other, normal, general languages do (e.g., namespaces).

R, on the other hand, tries very hard to be a "cute" language for mathematicians. Many of its well-intentioned features (e.g. the ability to bind a bunch of implicit globals) are horrible footguns when you are trying to do anything serious. And people aren't even trying to use it without these footguns, so R code inevitably ends up a mess that you'll need to eventually convert to another language for any sort of production use.

Rewriting the interpreter in Rust wouldn't solve any of these problems.

3 Likes

You might be surprised. There's plenty of information out there on how to build interpreters, and Rust is a really nice language to build one in, especially for a first-timer.

If nothing else, it's a great learning exercise that mostly doesn't need terribly much background (there's a lot of old theory about parser generators that it turns out isn't actually terribly helpful for real languages past maybe tokenization)

The top level work for a full (if inefficient) version is:

  • lex a source string into a sequence of tokens
  • parsing a sequence of tokens into an AST
  • interpreting by walking the AST
  • implement the standard library
  • wrapping in standard developer tooling like REPLs and package management

The most confusing is probably the first: lexing requires a lot of care that you're not off by one matching spaces, etc., so use lots of unit tests! The second is just matching and looping and peeking at the next token in a way that's pretty much a direct match to the grammar so it turns out to be very obvious most of the time.

Both of these should use an interface that looks essentially like: fn parse_foo(input: &str) -> Result<(Foo, &str), Error> (where lexing is just parsing a token), where you return the parsed value and the rest of the unparsed input or the parse error, a pattern that lets you easily compose parsers by just calling them in sequence with the previous results or bubble the parse error out with ?.

There's a newish approach called "Parsing Expression Grammars" (PEGs) which are a lot more succinct, but I wouldn't try that my first time out, it's really easy to misunderstand what you're actually writing and can be really hard to exactly match existing languages.

Interpreting is mostly incredibly boring, easy code for most languages: fn eval_foo(foo: &Foo, context: &mut Context) -> Result<Value, Error> where Context can have whatever you like but generally will have a chain of namespaces in scope (which are really just a map from name to value). It mostly looks like repeating the matched syntax tree with the equivalent Rust code after unwrapping all the error cases, so this is an exercise in not getting too distracted by giving better errors.

Implementing a standard library is the bulk of getting real code to run. See if you can get a full API list and generate a bunch of stubs, to get things running as quick as possible. You can even do things like resolve any unknown function calls in your evaluator to a "stub" value that names the unimplemented API but otherwise is ignored by everything, to get code just chugging straight through no matter how much is missing. This is really effective at keeping momentum up!

1 Like

To be clear, I'm not saying that R being written in C is a/the problem.

I guess my main gripe is that the pool of contributors is limited substantially by the fact that the code is hosted in a "hard-to-reach" place, and that the source code is hopelessly ugly, which just means there's a barrier to entry for people who might otherwise want to contribute.

But the maintainers won't move the source from their SVN setup to GitHub (or any Git solution for that matter) for various reasons.

So I figured having an interpreter that is easier to contribute to (on GitHub for example), would be nice. And what's the fun in re-writing C code into more C code, so why not Rust. People also seem to love contributing to Rust projects, since it has such great tooling available. cargo alone, rustfmt, CI is a breeze, testing is probably the best/easiest I've come across, and so on.

And it might open R to the possibility of some solid improvements (type-hinting in R would be a game-changer for me personally for example, although my understanding is that would be a challenge given the underlying types of variables in R, e.g. the concept of a scalar type like f64 isn't really a thing, rather many of these things are really vectors of length 1).
Or improvements to the packaging system (you cannot currently divide a package into subdirectories/submodules, for example. So if you want to make a big production library, in a well structured layout that makes sense, then godspeed to you).

But to be honest, it might just be wishful thinking that R could absorb/inherit some of the great stuff from Rust.

I have had a brief play with interpreters using Robert Nystrom's book "Crafting Interpreters" (everything else aside, probably one of the nicest-to-read CS books I've ever come across), but that's about it.

Getting an interpreter running would be a significant task as it is, then add on top of that R's base libraries, it would take me years lol, and there are people out there much more suited to the task than I am.

Depends on what you're after: if you're targeting bug-for-bug then yeah, it's years of time. If you just want to get a skeleton up that can run some small but interesting code, to get the ball rolling, then you can get something going in weeks at most. I'm not saying that you should take on that work, but equally it's unlikely anyone else is going to want to do that either.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.