Nice to hear you’ll be joining Cretonne! I want to clarify a few things still:
Not very much, I think. LLVM’s current position is not just due to being great technology, but also due to timing (being in the right place at the right time), politics (one of its chief distinctions from GCC was and is the license), luck (Apple throwing its weight behind it made a huge difference), and social (e.g., it came out of research and was explicitly promoted as tool for researchers). Replicating LLVM’s rise requires not just improving upon it as LLVM improved upon earlier projects 15 years ago, but also something akin to the other factors.
The examples of desirable changes I gave concern precisely the LLVM IR, not implementation details. Most others I can think of are also like this. The LLVM people are rather good at changing implementation details and migrating APIs, but touching the core of the IR is much more involved.
Note that “the parts after LLVM IR” don’t just encompass machine specific optimizations. They include
- instruction selection: how do you even map target-independent operations to machine instructions
- register allocation
- scheduling: in what order do you organize the instructions
- legalization: how to replace types and operations the machine doesn’t support at all with ones that it supports
- etc.
All of these steps have to be done in some way to produce machine code at all, but doing them better or worse greatly impacts codegen quality. For some of these tasks there are relatively simple solutions – regalloc can be per-statement, scheduling can be inherited from the input program, etc. – and these give positively atrocious code (some more so than others, bad scheduling you may not even notice in many cases, but naive regalloc is incredibly bad).
But yes there are also quite a few important optimizations in LLVM backends that you’d absolutely want to have in any production quality compiler, and which would be very difficult to do on a target independent representation.
There are alternative approaches that use one IR longer throughout the process (Cretonne being an example), but LLVM IR is ill-suited for that. There are good reasons why LLVM IR is mapped to other IRs as soon as codegen starts.
My point was that more fine grained parallelization (than separate modules, which already works fine) is not very high on the wish list. It would be nice, but, well, only nice and not more.
Two things: First, this prioritization seems contrary to getting other LLVM users and developers on board. Second, as I said, a lot of redundant work could be avoided by adopting what LLVM developers already know is “a better way” or “the future” but can’t implement easily due to inertia (or are currently in the process of implementing). For example, instead of porting LLVM’s Selection DAG infastructure and then scrap it when GlobalISel is finally finished in upstream LLVM, it would be much better to directly adopt the GlobalISel design (provided one copies LLVM at all wrt instruction selection).