Feedback Request: Idiomatic Rust basic language interpreter; comparing Haskell & Rust

Hello,

I am trying to learn Rust. I wanted to do so by writing a simple program in my current main language, Haskell, and then write the same program in Rust. I have finished the program and I'd like advice on whether the code I have written feels "idiomatic" or whether it could be simpler.

The Haskell original is here --- rust-vs-hs/lang.hs at master · jeapostrophe/rust-vs-hs · GitHub --- it is a simple language with four primitives and I use the ReaderT pattern for accessing variable bindings and I use ExceptT for throwing errors.

The Rust code is here --- rust-vs-hs/main.rs at master · jeapostrophe/rust-vs-hs · GitHub

Here are some choices I made when writing it (which I hope to get feedback on):

  • I use Rc for everything because I expect that I will want to have sharing in the AST (i.e. post-optimization) and in values, rather than copying large objects. (This language doesn't have those things, but the language I really write will).
    I believe the alternative is to use Box everywhere and this basically means that the expressions will get consumed by the evaluator or I'd need to copy pieces of them.
  • I use Result to be "Rusty" rather than something like ExceptT
  • I define an Eval trait like the Eval typeclass. The Haskell code's type eval :: a -> EvalApp Val when you expand it is eval :: a -> EvalEnv -> Either String Val, which is essentially what I've chosen for the Rust program eval(&self, ee:&EvalEnv) -> Result<Rc<Val>, String>; but I can only name the return since there's no curried functions in Rust (or rather, returning a closure seems non-idiomatic)
  • The eval_prim function feels really ugly with the double match because I couldn't figure out how to match the contents of the vector in addition to the length.
  • It wasn't intuitive why **x worked on line 71 (the PNumToStr case) but not on line 59 (etc)
    I get the error "cannot move out of an Rc; move occurs because value has type Val, which does not implement the Copy trait`". It seems weird that I get it in one place but not the other.
  • Line 96 feels really ugly to me. The type annotation was necessary and the .into_iter().map().collect() sequence was a lot... I just want to write .map().
  • The creation of the test objects in main is really gross, but I feel like I won't notice when I write the parser for real.

To compare, I wrote another version that does use Box everywhere --- rust-vs-hs/main_b.rs at master · jeapostrophe/rust-vs-hs · GitHub

  • I find this code to be much more pretty (see eval_prim for instance)
  • But, I worry that it hurts to have this prettiness, because I will pay for it in using more memory, because there's more copying.

Thank you for any comments or advice!

I would say your enum case names are not particularly idiomatic. It looks like you used a prefix so you could use all the enum cases directly, which makes sense.

But it'd say it's more idiomatic (and in my opinion at least, typically easier to read) to just use the more verbose Prim::Add.

That's a pretty minor thing though.


Another strategy you may want to explore for shared ownership is using indices into a Vec or similar instead of a Box or Rc. That's a pretty common pattern, and can in some cases end up being easier to work with.

1 Like

This is a great thing to just measure, especially since you've already written it. Try it out, making sure to use a good allocator (maybe MiMalloc — Rust memory management library // Lib.rs). Especially if it's a bunch of same-sized allocations, just copying might not be that bad since the allocator will pool it well.

2 Likes

I see that eval_prim validates that a Prim has been associated with the right number and type of arguments when these data are used. It would be more idiomatic to validate during the construction of some sum type, by attaching the arguments as fields of the variants. (If this sum type is named Op, it can have a function Op::new(opcode: Prim, vs: Vec<Rc<Expr>>) -> Result<Op, _>.) Then uses of (Prim, Vec<Rc<Expr>>) would become Op. Maxims I've seen used are "make illegal states unrepresentable" and "parse, don't validate" (I believe the latter was coined by a Haskeller). I leave providing example code to someone who isn't typing on a phone.

3 Likes

You shouldn't really need to do this. By their very nature, ASTs have a very hierarchical ownership story and there's no shared ownership.

If you are wanting the Rcs to allow shared ownership when using something (e.g. you want to pass a sub-tree to some evaluation function), just use normal & references.

Using Rc<String> to avoid duplicating strings makes sense, but you can drop a level of indirection with Rc<str>. String contains a pointer to your text, so a Rc<String> is a reference-counted pointer to a pointer to some text, whereas Rc<str> stores the text directly in the Rc's heap allocation. If you want to be really fancy, you could also use a string interner, but that's probably overkill unless you are rustc.

Your EvalT type also uses a Rc<Val>, where Val::VStr contains a Rc<String>, so we effectively have 3 levels of indirection going on (string values have a Rc pointing to a Rc pointing to a heap-allocated String which points to some text). I'd drop all the Rcs and rewrite Val to be enum Val { Num(i32), String(String) }.

Just because this is Rust, doesn't mean we need to avoid all the string copies we possibly can.

Your Haskell background shows through with the naming :stuck_out_tongue: Normally, you'd write Val::Num rather than Val::VNum and get rid of the use Val::* import. Also, the name EvalT feels odd for Rust because the T suffix doesn't seem to add any value (kinda reminds me of how C might call a type value_t instead of value). I'd probably drop the type alias altogether so it's more obvious from the signature that functions like eval_prim() can fail and I don't have to mentally inline its definition every time I see it.

There's something about the Prim::eval_prim() method's match statement that smells off to me, but I can't put my finger on it.

I'd also drop the Eval trait altogether. You don't seem to be using it generically, so this sounds like the "Concrete Abstraction" code smell.

3 Likes

For me, it's the dynamic cardinality of the arguments. Addition, multiplication, and division are always binary, and number -> string conversion is always unary, yet this is always being re-checked during evaluation instead of the aforementioned practice of parsing into a guaranteed-correct representation.

At a lower level, whitespace usage is also non-idiomatic in that method, and format!("{:?}", n) should be n.to_string().

1 Like

I'm very grateful for all of these comments. I think I've made lots of improvements based on them.

I want to "defend" myself a tiny bit on the issue of the application. It is very uncommon in language implementations to store only type correct ASTs, because user programs may have errors in them and applicables (like primitives) may appear in value positions.

In other words, it may seem obvious that we should be able to store Add(Box<Expr>, Box<Expr>) rather than App(Expr, Vec<Expr>), but remember... there can be programs like ((if (1 < 3) + number->string) 5 6) i.e. App(If(App(Prim(Lt), [Val(Num(1)), Val(Num(3))]), Prim(Add), Prim(NumToStr)), [ Val(Num(5)), Val(Num(6)) ]). This program cannot be "pre-parsed" into a type-correct representation... we only discover the actual behavior through evaluation. (If the language had a type-system, we could reject it earlier and produce a typed intermediate representation that had things like App2 or curried everything, but we're not going to have App17 in the AST or the IR, so even that is uncommon.)

For example...

Thank you! And please keep the comments coming <3

Yes, but it's also uncommon to directly interpret the AST (not only for this reason).

Yep. A common pattern I've seen is to parse into an untyped concrete syntax tree that you might then lower to a strongly typed high-level intermediate representation that can be evaluated.

I really like the description in Rust Analyzer's developer docs.

Although of course Rust Analyzer implements a production grade parser whereas this is intended for educational purposes with a dynamically typed language, so it's fine to evaluate the AST directly.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.