OCaml vs Rust for writing compilers

For those familiar with both Rust and OCaml:

If you were writing a compiler, would you go with Rust or OCaml ?

1 Like

Well, the first Rust compiler was written in OCaml, before it became self hosted.....

4 Likes

Is this a vote for OCaml because the first Rust compiler was written in OCaml

or

Is this a vote for Rust, because as soon as it was possible, the team switched from OCaml to Rust ?

:slight_smile:

8 Likes

Neither, my point being:

  • OCaml is a suitable language for writing compilers
  • Rust is also a suitable language for writing compilers
  • The switch had got nothing to do with the quality of the language - reaching self-hosting is usually a goal for languages to reach before they release the "1.0" version.

So, I'd say use whichever one you're more comfortable in.
That said, i think Rust has an advantage for making more performant code while is more verbose than OCaml.
Also, if it's a personal project, I think having fun is more important.
If it's a company project, then things like being able to hire OCaml devs vs Rust devs, maintanibility etc begin to factor in.

5 Likes

You are asking on a Rust forum so…

I think even though Rust is not the (somewhat) "pure" ML language that OCaml is, the features it provides are more than enough for writing a well-designed, correct, and fast compiler. There are also several libraries specifically designed for making compilers easier to write, such as:

  • A ton of parsing libraries (nom, Pest, other lesser-known combinator libs, etc.) – although parsing is probably the most trivial part of a compiler. Proc-macro-2 will also come in handy here, since lexing is very boring and tedious to write by hand, and having built-in Span support is great.
  • RustTyC for constraint-based type checking on a lattice.
  • Ariadne for printing beautiful diagnostics.
1 Like

A compiler for what?

The number of compilers that really need that last 20% of performance are relatively rare. Especially if it's uncertain what you're compiling, what you need is something you're familiar with where you can iterate quickly. Rust can be good there, but if you want to write it in Go or Haskell or C# or something that'd be totally fine too, at least for a while.

Obviously, you write in whatever language you're implementing, then! Though I haven't found a self-hosted SQL compiler yet...

But seriously, both are far better than what people normally use, I doubt it makes much difference specifically for compilers vs all the other factors for a language choice.

Some other factors to consider:

Rust has a really nice package system and community, way ahead of OPAM (you can just list every OPAM package!)

Rust isn't all that great at composing parser functions: nom asks you to do some ugly twists at times. Depending on your approach this might be annoying or not.

You can just nick bits like the rust llvm backend wrapper (complying with the MIT/Apache licencing, of course) to quickly get excellent codegen. The ocaml backend seems a lot more confusing?

2 Likes

nom isn't the only option; there's also combine, which might be especially suitable if one's more comfortable with more functional languages (although I imagine it's closer to nom than to the Haskell library on which it was based).

Relatedly, there are a number of libraries that were spun out of, based on, or intended to be used in rustc itself, such as

1 Like

combine suffers from the same basic issue of it's not particularly natural to define your own parsers in terms of other parsers in comparison to my experience in more functional languages, it seems it's just not possible in Rust right now. I think you would need to have const type inference or impl trait on statics?

My thoughts are as follows:

I think AST structures are slightly more convenient in ocaml, because recursive data structures...

Others have mentioned the libraries for rust compiler development so I won't bother, but besides that one of my favorite things is the ability to target wasm (i.e. compile your compiler to wasm and use that as a bootstrap environment). That is nice for both portability and ensuring your compiler runs in a sandboxed environment, averting some of the supply chain attacks our current generation of languages suffer from.

OCaml's GC is exactly what holds it back from targeting wasm, but also is excactly what allows for a nicer AST. Ocaml's AST's are not that much nicer and Rust's not that much worse that I would choose it over rust, but both are fine languages to write a compilers in...

2 Likes

Okay, I've never tried to store the parsers in statics. I think the intended way is to store them in nullary functions. (I imagine the Haskell equivalent is not so verbose, but such is Rust.)

Yeah that was basically my point, rust makes a few use cases a bit uglier than they strictly need to be right now, and parsing with combinators is one of them.

Are Box<Node>s that much worse? My parsing work never got that ugly, but I seem to remember pattern matching for example not being that much of an issue in practice: just tack on an as_ref or whatever.

1 Like

This was really insightful. Did not realize until now that all languages with "nicer" AST than Rust are also those with GCs (and often mixing whether something is stored by value or by ptr to heap).

Presumably eventually we will have something like the deref_patterns formerly box_patterns, which would even things up in this regard, but not differentiating between heap and stack allocations just makes the implementation of that much easier.

1 Like

I suppose I should note the following advice from Salsa's issue tracker:

@matklad, what would you suggest if someone would like to write an incremental compiler without being able to fix Salsa if necessary? I see there's also Adapton, but it seems too lower-level for "not think[ing] about it".[1]

I'm guessing the practical answer is "wait five years; there are no good solutions yet".


  1. Edited to add: Between reading that GitHub ticket and posting here, I looked around Lib.rs at crates tagged as #incremental and looked at Moxie. It's intended for GUI apps rather than compilers, but it made me wonder how well one could draw a correspondence between the input to a compiler and a GUI, with definitions and other identifiable elements of source code corresponding to GUI widgets. ↩︎

Which kind of the compiler?

  • "I am writing a hobby language" => use salsa
  • "Google allocated a team of N engineers and a budget of M million dollars to write a next generation of the Dart compiler in Rust" => have one person on the team spend two weeks full-time investigating salsa and related technologies, then either commit to bringing salsa to 1.0, or friendly-fork salsa for the project.
4 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.