Single-Pass Languages and Build Speed


#1

D’s grammar is designed so that the language becomes a, so-called, single-pass language. This means that lexical analysis can be strictly separated from parsing and semantic analysis. In fact, D code, in some case, doesn’t even have to be lexically analyzed; it suffices to search for a matching brace (and comments) when skipping unittest definitions in non-unittested builds. The same goes for unused template definitions.

This is one of the reasons why the reference D compiler (DMD) compiles (in debug mode) itself in 2.5 seconds, the runtime in 2.6 seconds and the standard library Phobos correspondingly in 6.5 seconds, totalling to 11.5 seconds.

Another noteworthy fact is that a D main file that imports all the modules in druntime and Phobos has an overhead of around 0.5 seconds in total build time compared to a en empty main file.

In the light of this why has Rust chosen to inherit C++'s template syntax? AFAICT, this makes it impossible for Rust to be a single-pass language.

Further I’ve noticed that the Rust distribution “cheats” in this regard by only including some kind of binary (pre-compiled) representations of the standard modules. IMHO, this makes compilation benchmarks between Rust and other languages (not utilizing such caching) unfair.


#2

I don’t think parsing is much of a factor in the compile time comparison. For example, when compiling regex, a middle-sized crate, in debug mode (with -Z time-passes), parsing up to building an AST takes 0.098 seconds, while the LLVM backend alone spends 4.471 seconds.

Not sure what exactly you’re referring to here…

If it’s about building the compiler itself: the Rust compiler is written in Rust and uses its stdlib, so it needs its binary form for bootstrapping. But the stdlib is compiled together with the compiler when building a complete rustc.

If you mean that the stdlib is not compiled from source every time you build a crate, I don’t see why that is necessary - it’s not like a C or C++ compiler build libc/libstdc++ from scratch for each project.


#3

I wouldn’t call that cheating :slight_smile: Rust was explicitly designed to be extremely modular at the source and at the binary level. The idea that there is no single global namespace of crates and that each crate is anonymous from the crate’s point of view is ingenious! It’s sad that this cheat does not really help with horrendous build times at the moment :frowning:


#4

Ok, thanks for your answer! :slight_smile:

Which use case in iterative Rust development (build-phase) is currently significantly slower than in other languages? In other words, which use case doesn’t make use of IR-caching recently introduced?


#5

That’s a remarkable fact - at least we get to know about bad code early. But a question I’ve been wishing to ask: clang uses the LLVM backend, and is a pretty fast C++ compiler. What is the difference?

The other point is that there’s room for alternative backends, optimized for correctness and compile speed, not for performance. Certainly for most exploratory work, very efficient output is not needed.


#6

The reason why LLVM take so much time is (IIRC) that rustc just throws whatever it has to LLVM and leaves most of the hard optimisaton work to it.

Now that MIR has landed, rustc will be able to do more optimisation on its own and thus the LLVM phase should take less time. The work on MIR will also enable incremental compilation (should already be in nightly ?) which will improve compile times even further.

See Nikos blog post about MIR.


#7

Yes, I must say that nightly feels faster. I understand that it’s early days yet, so look forward to more improvements. clang is definitely an existence proof that LLVM need not be slow.


#8

Could you elaborate on which cases this applies to?


#9

I’m still waiting for a comment regarding why Rust inherited C++'s template syntax and the potential problems this may infer, when for instance we want to do comparison inside template parameters (which I don’t if Rusts’s traits syntax requires).


#10

I’ll try to write a more thorough answer, but there is a wide range of topics to cover, so the following won’t be necessary really well connected.

First of all, let’s discuss parsing. Parsing certainly is not a bottleneck of rustc, and I would guess that even in DMD parsing per itself takes a small fraction of time (is this true?). rustc certainly leaves a bit of performance on the table during parsing. For example, it uses a hand written lexer instead of a giant generated state machine. But it shouldn’t matter that much. I don’t believe syntax can really affect the performance of the compiler (unless you make deliberately weird syntax, which can of course be even undecidable).

However, Rust syntax is pretty easy to write a LL style parser for (I did one). There are some warts, of course ( https://github.com/rust-lang/rfcs/pull/1685, or, from today’s bug, {1} & 1 which is not a bitwise and), but in general syntax is really LLish and nice. Nothing close to the most vexing parse. Particularly, there is no problem with generics syntax, because in expression context generics always start with ::<, and chaining comparison operators 1 < 2 > 3 is forbidden. There is no problem at all lexing << and <, you just always lex them as separate tokens (or as a single token), and deal with consequences in the parser, which is easy (IIRC, there were some problems with << in macros, but they are fixed now). Stuff like optional semicolons after block-like expressions and restrictions that no struct literal can appear in the condition is more annoying, but bearable as well.

Next, let’s talk about single pass compiler. I don’t know what the official definition of the thing is, but in my mind single pass means that the compiler don’t need to keep the code of all program in memory to compile it. This necessary requires forward declarations. So C compiler can be implemented in a single pass, because C has forward declarations, and despite the fact that lexing and semantic analysis of C are not independent (the infamous lexer hack). So, by my definition, dmd can’t be a single pass compiler, because D does not require forward declarations. This makes performance of dmd even more impressive!

Now, to the rustc performance. First of all, rustc is horribly slow in my opinion (although it is absolutely great in all other aspects I am familiar with). There are two modes of slowness:

  1. Slowness when compiling everything from scratch (make clean && make build). This is pretty slow, because the run time is dominated by the code generation in LLVM. My rough measurements (2kloc projects in C++ and Rust) show that in this mode clang++ and rustc perform roughly the same. I also wonder how dmd compares with ldc in compile times. IIRC dmd uses a custom backend, specifically tailored for compilation speed (like Go, which is also famous for compilation speed).

  2. Slowness when compiling after editing a couple of files. This boils down to compilation modes. Rust and C++ and perhaps D are all based on the notion of compilation units, which can be compiled independently. Generally, only modified compilation unit and its dependencies need to be recompiled. However, in C++ compilation unit is a single file, while in Rust compilation unit is a whole library (a crate). So, when you develop a library in Rust, you need to recompile it fully after every change, and this is double plus unfun. Although a better incremental compilation on the function level is being worked on. This is the change I have been waiting for since I’ve written my first thousand lines of Rust code :slight_smile:

So C++, Rust and D are pretty similar in the compilation model. But there is a big difference in how templates are compiled, which potentially can give an edge in compile times to Rust. In C++, templates are type checked at instantiation time, which means that every time you instantiate a template with a different type parameter, you have to do some amount of work again. In Rust, templates are checked at definition time only once. However this is only a marginal advantage, because you need to generate different code for different instantiations of templates anyway, and codegen is the actual bottleneck of the compiler.

My understanding is that compiling a library with templates in Rust will basically store parsed templates in the .rlib, and all the code generation will happen when you instantiate those templates in the binary which links to the library. There is an idea that you can do some pre codegen optimizations once per template, such that the amount of code to generate will be smaller. This is the potential benefit of Rust compilation model / type system which could be leveraged to improve compilation speed.


#11

Yeah, that’s what I’ve been waiting for in D aswell.


#12

Rust did not inherit C++ template syntax. It did choose to use angle-brackets, but use them differently.

The simplest way is to check the two side by side:

// C++
foo<4>(5);

// Rust
foo::<4>(5);

In Rust, the introduction of :: before the template parameter is necessary, and this simple tweak avoids the parsing ambiguity that C++ must face.

For cuteness, ::<> is dubbed the turbo fish operator; although it’s not really an operator.


#13

D:

foo!(4)(5);

Or even just:

foo!4(5);

#14

Just letting you know that the generics syntax is not going to change. That would break the stability promise. A lot.