@rkruppe - Thank you for the detailed response. This is exactly the kind of information I’m looking for (the lay of the land as seen by those more knowledgeable). That being said, I have a few counter-points (or perhaps caveats) to your points. Please don’t take this as me arguing with you and definitely don’t take it as me thinking I know better than you (as I’m 100% sure that is not the case), but a few things that come to mind are:
One of the biggest boons of LLVM is that it is a de facto standard in industry and research. It’s hard to overstate how many people are agreeing on using LLVM and how much this consensus helps all involved: there’s mountains of experience, shared code, interoperability, cooperation, etc. in and around LLVM and its community. Any rewrite that does not have the full backing of the LLVM community automatically loses this.
I definitely agree that this is a HUGE issue. Probably the biggest issue. And insight regarding this is what I hoped to solicit before I put significant time into something that is unlikely to ever garner sufficient community involvement. That being said, I’m wondering how much, “If you build it, they will come!” plays into this calculus.
Without even deviating from its “core design philosophy”, there are many things in all parts of LLVM that would have changed for the better if not for inertia. Even the improvements that were decided on sometimes stall (e.g., removal of pointee types).
It may not (and probably wasn’t) have been clear by my furtive attempts at a “Preliminary Plan” that the idea would be to initially pretty much follow the implementation of LLVM, but, be free to internally make different choices in how the IR is represented, how the transformation algorithms interact with the AST, etc. Preserving compatibility and the ability to gradually replace LLVM by way of considering the serialized LLIR and the command-line arguments to LLVM to be the defined fixed interface, but, the internals could change wildly. Only once existing functionality of LLVM were complete, would there be discussions of changing the interface, but, the implementation could be optimized in Rust however was befitting.
the target-dependent parts are usually under-estimated. Code passing through LLVM spends a huge portion of its time in IRs other than LLVM IR
This I was not completely aware of. My thinking was that machine specific optimization and transformation was limited to final optimizations of the specific machine-level ASM after all LLIR optimization had completed and was translated to machine-level ASM AST tree and was fairly limited. I’m wondering how important this is ultimately? If RLLVM didn’t have the same level of Machine Specific AST optimization as LLVM would that be terrible? Could a slightly modified design of the internals of how optimization is performed and represented in the LLIR internally in Rust could obviate most of the need for machine specific optimizations?
My thoughts here would be that MSA -> Optimized MSA (for pipelining, cache coherency, CPU bugs, etc) would happen in the Assembler. Now, I understand that optimizations made in LLIR before translation to MSA may result in code that is sub-optimal for the CPU in question after instruction reordering and things like that, and had the LLIR optimization stage made different choices, then the final ASM stage could do better. This then requires some sort of feedback to try different combinations of LLIR optimizations potentially to end up with something the ASM can deal with the best.
To what degree could RLLVM pursue a genetic algorithm that would try some candidates in the search space, invoke the ASM stage, count the resulting finals costs (expected/estimated clock-cycles and/or dead-cycles due to cache misses etc) and feedback to next generation with a limit on the number of generations to attempt to arrive at an “Optimal/Most Fit”. Could this made to be deterministic? Would non-deterministic be OK? These would be some interesting things to explore.
NOTE: This last paragraph, after I’ve looked into the concept a little, seems to be me self-discovering the concept of “Stochastic Super-Optimizer” in my imagination, so, nothing of merit as it is already solidly in the literature. 
That is not to say I believe the LLVM code base to be memory safe or very parallellizable or anything, in fact I don’t. But the most frequent and the most serious issues are unrelated to that. Rewriting the same algorithms in a different language does nothing to fix miscompiles, improve compile times, categorically prevent certain missed optimizations, make back-end work less manual and error prone, or help with any of the other issues that keep LLVM developers and users up at night.
This is one of the areas I was imagining room for improvement beyond simply porting the algorithms to Rust from C++ (which is a motivated me to say that it wouldn’t be an in-place replacement method-by-method of LLVM, but, instead would rely on the abstraction of LLIR to provide staged replacement of functionality) so that the opportunity would exist to optimize the internal AST representation for Rust and permit more parallel processing (using things like work-stealing through Rayon for example).
LLVM is a moving target. It continually receives improvements, bug fixes, new features, refactorings, etc. so if one takes a snapshot of LLVM today and toils to rewrite that 1:1, the result will be a lot worse (on many axes) than LLVM is by the time the rewrite is finished.
This would definitely be a problem that would need addressed in the on-going development. The way I would hope that this would ultimately play-out is:
- First, RLLVM is able to handle all the same LLIR optimizations (for the most part) even if all the optimizations aren’t yet implemented
- then, as optimizations are ported, choices are made to optimize differently in RLLVM as opposed to LLVM (not having the exact same internal representation for example)
- new features in LLVM are evaluated while RLLVM is under development, and ported or not based on their applicability and usefulness to Rust first, then, to their overall applicability and usefulness second (with perhaps different trade-offs made in the implementation)
- at some point, hopefully, sufficient progress is made to “Tip the Balance” where more contributions come in to RLLVM, some from previous LLVM contributors, but, hopefully also from Rustaceans who now feel empowered to contribute to something as Low-Level (pun intended) as ®LLVM.
Now, all of what I’ve just said is probably naively optimistic and I am definitely (through feed-back such as yours) coming to believe that that is the case. I’d still like to hear from anyone “In the Know” who might have a different take on it who might feel differently about the viability and usefulness with some, if not concrete, then hopefully hard-dirt, reasons that the idea of RLLVM would make sense given the whole issue of the overall LLVM community situation.