Best way to implement a language inside Rust

Hello,

I'd like to implement a programming language inside Rust. I already know quite a bit about syntax extension, although my knowledge is a bit outdated...

For illustration purposes, imagine you implement a small functional language inside Rust with a function-like procedural macro, e.g.,

// In file1.rs
functional!(
  fun add x y = x + y
)

// In file2.rs
functional!(
  fun add2 x y = 2 * (add x y)
)

(Note that I don't want to implement a functional language inside Rust, that is just for illustration).

I believe it is not really a problem to call add from file2.rs as long as we import the content of file1 in file2.
My question is: In my functional programming language compiler, how can I implement an interprocedural analysis of the code? That is, viewing all the macros "functional!" in the Rust crate as a whole sub-program that I can analyze.
Furthermore, what about inter-crates analysis?

I got only two ideas so far:

  1. Rely on plugins and lints to perform the code analysis of the macros, and proc_macro to generate the code.
  2. A rather extreme way: Forget about proc_macro and plugins, and write the compiler on top of Rust (that is, generate the Rust code before calling the Rust compiler).

Any advice? Any new development in procedural macros that I missed?

Many thanks for any help, and merry Christmas!

I'm not going to say it's impossible, but as stated this would be extremely difficult because it's going against the grain of how Cargo and rustc work.

Proc-macro invocations don't receive the text of other proc-macro invocations in the same file, let alone different crates. I'm not sure about plugins, maybe there it's possible to see other macro invocations in the same crate. It certainly won't give you easy access to macro invocations in other crates, as crates are compiled individually by separate instances of rustc.

Maybe with enough hackery you could do something anyway; proc-macros are normal executable code so you could maybe do something goofy like search for the other code in the current crate on the filesystem and parse it from there, but that would be really complicated and prone to breaking. There have been plans to sandbox proc-macros for security reasons which would completely break such a scheme.

My advice would be to find a different approach. For instance, it's not that uncommon for proc-macros to want to do some extra type-checking. They can't do that directly - macros run before type names are resolved so you can't really know what type any identifier refers to - but you can generate hidden code that will fail the compilation later if the types don't line up. Perhaps you can find a way to express your inter-procedural analysis like that.

I was going to recommend this – in fact it's a pretty standard approach to implementing DSLs. Arrange your types in a way so that you can express the type system of your language in terms of the rules of Rust's type system. This is generally accomplished by creating custom types and traits, then generating code that relies on these types and traits for describing the structure of your language.

Hmm, encoding all the information in types might be difficult but not impossible indeed. However, I'm afraid that would lead to compilation errors comparable to those in C++ when you write templates for meta-programming. Those are not very understandable.
I need to think more about it. Thanks for mentioning this technique.

If the type system and compilation model of your language differs significantly from that of Rust, you are not going to get pretty error messages by deferring to Rust's type system, full stop.

If you want good error messages, implement your own compiler by hand. Type checking might be easier to do thanks to crates like rusttyc.

Right. I think that I can reuse most of the tools available in a procedural macro anyway (for error reporting, rust AST parsing with syn, etc.). I will also use the Cargo build scripts to preprocess the files of my language. Hence, I can rely on the Cargo ecosystem to publish crates as well.

Thanks for the discussion, it helps clarifying my ideas :slight_smile:

Instead of transforming the source files, you may be able to find and parse macro invovcations in your preprocessing script, and then emit the compiled version in a separate .rs file as a macro_rules! block.

I’ve used this technique to generate a parsing table for grammar rules embedded in the Rust source, for example.

1 Like