R interpreter written in Rust

avhz · April 27, 2024, 8:41am

Hi everyone,

I'm wondering if there would be any interest in developing an R interpreter in Rust ?

I personally don't have the skills to build one from the ground up, and definitely not by myself, but I like R and want to see it improved, considering the competition from the likes of Julia, for example.

The current source is hosted on a Subversion (...?) repository, which IMO limits the pool of contributors significantly, in addition to much of the current C source being unreadable (also IMO). This surely is a major roadblock when it comes to R's chances at remaining competitive.

I think Rust would be a great language to implement an R interpreter in (along the lines of RustPython), or at least include support for re-writing some of the C functions in Rust.

Just curious on people's thoughts

Cheers!

paramagnetic · April 28, 2024, 6:00am

R has 99 problems and being written in C ain't one. Python was becoming wildly popular even when CPython was its only major implementation.

Both R and Python have the serious social problems of not being designed for large-scale software engineering (dynamic typing, bad to mediocre package management story, etc.), and that most users aren't proper software engineers, who then go on writing brittle or outright incorrect code.

However, Python recognized pretty early that discipline is a good thing and started taking steps in that direction. PEPs for coding style, optional type annotations, language and library constructs that help you structure code and solve real-world problems (context managers, generators, process pools, …). It also aims to be a general language, so it provides features that other, normal, general languages do (e.g., namespaces).

R, on the other hand, tries very hard to be a "cute" language for mathematicians. Many of its well-intentioned features (e.g. the ability to bind a bunch of implicit globals) are horrible footguns when you are trying to do anything serious. And people aren't even trying to use it without these footguns, so R code inevitably ends up a mess that you'll need to eventually convert to another language for any sort of production use.

Rewriting the interpreter in Rust wouldn't solve any of these problems.

simonbuchan · April 28, 2024, 10:13am

You might be surprised. There's plenty of information out there on how to build interpreters, and Rust is a really nice language to build one in, especially for a first-timer.

If nothing else, it's a great learning exercise that mostly doesn't need terribly much background (there's a lot of old theory about parser generators that it turns out isn't actually terribly helpful for real languages past maybe tokenization)

The top level work for a full (if inefficient) version is:

lex a source string into a sequence of tokens
parsing a sequence of tokens into an AST
interpreting by walking the AST
implement the standard library
wrapping in standard developer tooling like REPLs and package management

The most confusing is probably the first: lexing requires a lot of care that you're not off by one matching spaces, etc., so use lots of unit tests! The second is just matching and looping and peeking at the next token in a way that's pretty much a direct match to the grammar so it turns out to be very obvious most of the time.

Both of these should use an interface that looks essentially like: fn parse_foo(input: &str) -> Result<(Foo, &str), Error> (where lexing is just parsing a token), where you return the parsed value and the rest of the unparsed input or the parse error, a pattern that lets you easily compose parsers by just calling them in sequence with the previous results or bubble the parse error out with ?.

There's a newish approach called "Parsing Expression Grammars" (PEGs) which are a lot more succinct, but I wouldn't try that my first time out, it's really easy to misunderstand what you're actually writing and can be really hard to exactly match existing languages.

Interpreting is mostly incredibly boring, easy code for most languages: fn eval_foo(foo: &Foo, context: &mut Context) -> Result<Value, Error> where Context can have whatever you like but generally will have a chain of namespaces in scope (which are really just a map from name to value). It mostly looks like repeating the matched syntax tree with the equivalent Rust code after unwrapping all the error cases, so this is an exercise in not getting too distracted by giving better errors.

Implementing a standard library is the bulk of getting real code to run. See if you can get a full API list and generate a bunch of stubs, to get things running as quick as possible. You can even do things like resolve any unknown function calls in your evaluator to a "stub" value that names the unimplemented API but otherwise is ignored by everything, to get code just chugging straight through no matter how much is missing. This is really effective at keeping momentum up!

avhz · April 28, 2024, 11:04am

To be clear, I'm not saying that R being written in C is a/the problem.

I guess my main gripe is that the pool of contributors is limited substantially by the fact that the code is hosted in a "hard-to-reach" place, and that the source code is hopelessly ugly, which just means there's a barrier to entry for people who might otherwise want to contribute.

But the maintainers won't move the source from their SVN setup to GitHub (or any Git solution for that matter) for various reasons.

So I figured having an interpreter that is easier to contribute to (on GitHub for example), would be nice. And what's the fun in re-writing C code into more C code, so why not Rust. People also seem to love contributing to Rust projects, since it has such great tooling available. cargo alone, rustfmt, CI is a breeze, testing is probably the best/easiest I've come across, and so on.

And it might open R to the possibility of some solid improvements (type-hinting in R would be a game-changer for me personally for example, although my understanding is that would be a challenge given the underlying types of variables in R, e.g. the concept of a scalar type like f64 isn't really a thing, rather many of these things are really vectors of length 1).
Or improvements to the packaging system (you cannot currently divide a package into subdirectories/submodules, for example. So if you want to make a big production library, in a well structured layout that makes sense, then godspeed to you).

But to be honest, it might just be wishful thinking that R could absorb/inherit some of the great stuff from Rust.

avhz · April 28, 2024, 11:17am

I have had a brief play with interpreters using Robert Nystrom's book "Crafting Interpreters" (everything else aside, probably one of the nicest-to-read CS books I've ever come across), but that's about it.

Getting an interpreter running would be a significant task as it is, then add on top of that R's base libraries, it would take me years lol, and there are people out there much more suited to the task than I am.

simonbuchan · April 28, 2024, 1:25pm

Depends on what you're after: if you're targeting bug-for-bug then yeah, it's years of time. If you just want to get a skeleton up that can run some small but interesting code, to get the ball rolling, then you can get something going in weeks at most. I'm not saying that you should take on that work, but equally it's unlikely anyone else is going to want to do that either.

system · July 27, 2024, 1:26pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
What's everyone working on this week (28/2022)? community	4	676	October 9, 2022
What's everyone working on this week (32/2022)? community	3	761	November 12, 2022
State of rust repls?	7	6278	January 12, 2023
Rust Newcomer from Python	5	1590	September 18, 2020
What's everyone working on this week (30/2022)? community	20	1079	November 2, 2022

R interpreter written in Rust

Related topics