I remember briefly doing some research on how Rust works and I discovered it actually compiles it self. I was confused because I never knew you could build a coding language that builds itself.
From a noob's perspective on compiler design/topics, how would you explain this process of building the rust compiler? It fascinates be even though I can barely understand the steps in between.
I have no idea to any useful detail but:
A compiler converts the source of programs into executable code. That is, machine instructions.
A compiler is itself a program. It is written as source code. Which gets compiled to executable programs.
So why not feed the source code of the compiler into the executable compiler and have it produce a new executable compiler?
I have long held the idea that any language that does not allow one to write and compile a compiler for the same language is not "a real programming language".
The searchable term you're looking for is "self-hosting". You need to get to that point via something else (and in Rust's case I believe that was via OCaml).
So, I guess the simplest implentation works as follows.
You first implement your programming language in some other lanuage. Then you rewrite the compiler in your own lanuage, compiling with your previous compiler. All futher versions are compiled with the then-previous compiler.
That is how I understand it.
Which raises the question, how did the first ever compiler get written? When there was no previous language to implement it.
At some point a compiler for some language must have been written in assembly language. Which then compiled the compiler.
But then we have the same question about the assembler. At some point an assembler must have been written directly as machine instructions, in binary, by hand!
I have no idea where this first happened.
What you can do to compile a compiler with itself is:
- Compile the compiler with the previous version of the same compiler.
- Compile the compiler again, this time with the current version of the compiler (that we got as the output of step 1). This may lead to a more efficient compiler, since we have now used the latest optimizations in the compiler.
- Compile the compiler again, this time with the properly optimized current version of the compiler (that we got as the output of step 2), and verify that the output is identical.
For rustc specifically the bootstrap chain involves OCaml. The first Rust compiler was written in OCaml (and in fact it is one of the languages that influenced Rust). Only a couple of years later did a Rust compiler get written in rust itself. Both existed in parallel for a bit before the OCaml based compiler got deleted. Rustc back then was a lot buggier than it is right now. In fact I read somewhere that it was buggy enough that the bootstrap compiler would get updated every couple of days or even more frequent (so between the first self-hosting compiler and rustc 1.0 there were hundreds if not thousands of steps), only later on did these updates slow down to bootstrapping from the previous release. This is also a part of why mrustc is a big deal. It is basically impossible to reproduce the original bootstrap chain, so an alternative rust compiler with an independent bootstrap chain (this time from C++) was the only way to know for certain that there was no backdoor introduced at some point.
At some point (when even you were young) people used to write in assembly directly. So the first compiler was written in assembly to compile the minimal version of an early language. That minimal compiler was used to compile a larger version. Etc.
There a few interviews of people doing that.
Most of the practical/popular C and C++ compilers are written in C and C++ (respectively). Self-hosting is one of the early important milestones in the life of several programming languages, as it provides constructive proof that a non-trivial program can be written in the language.
I think I said that. Or at least tried to.
Ha, when I was young I did write in assembly directly. Still do some times.
I even spend some months writing in hexadecimal for a processor for which we had no assembler at the time. We designed code in an ALGOL like pseudo code, then compiled that to assembler with paper and pencil, then assembled it to hex manually, then had the typing girls punch it into paper tape, for loading into an EPROM programmer. Managed to write a debug monitor that way, complete with code loading and saving (to cassette tape), memory and register modification/inspection, break points, the works. Might of gone on to write an assembler that way but a ready made one turned up at that point.
So is there a language that can be "bootstrapped" in stages.
Have a very simple language that one man can write a compiler for, in assembler, in a reasonable time frame.
Use that simple language to compiler a compiler for a less simple language with more features.
Repeat 1) and 2) until the latest and greatest version of the language is built.
That way, given the source code of all those intermediate language versions one only need to put in the effort to write the initial compiler in assembler to rebuild the whole thing.
Here's a RustConf talk about it.