TL;DR - based on the feedback of @sunfishcode below (a core developer on Cretonne) and his clarification of the goals of Cretonne with respect to being the back-end for the Rust compiler here as well as the general community feed-back that a reimplementation of LLVM in Rust would likely NEVER be able to gain sufficient traction, it seems that the idea of RLLVM is not something useful to the community nor would it be the best investment of time for someone looking to work on compilers and Rust. At least that's the conclusion I come to based on the overall feedback. Thanks to the community for your opinions and insight!
I've been considering the idea of working on an implementation/port of LLVM to pure, idiomatic Rust (RLLVM). I made a post on r/rust (https://www.reddit.com/r/rust/comments/81iwxg/proposal_rllvm_rust_implementation_of_llvm/) where I attempted to solicit some input from the community regarding both the viability and applicability of such a thing.
So far, the feedback has been as follows:
Reasons NOT to create an RLLVM
- Too Big
- Too Complicated
- Companies (like Google and Apple) are building LLVM and individuals/small team cannot possibly compete with that
- NOT USEFUL TO THE RUST COMMUNITY OR THE COMMUNITY AT LARGE (is this true? opinions?)
- Cretonne (UNABLE TO ADD LINK SEE REDDIT) is a thing and seems to be a good start on a replacement of what LLVM accomplished (with different trade-offs) written in Rust:
- UNABLE TO ADD LINK SEE REDDIT
- UNABLE TO ADD LINK SEE REDDIT
Reasons to create an RLLVM
- Help evolve the Rust language by using it for something as low-level as LLVM thereby helping to identify any weaknesses or possible improvements to the language
- Create a more robust LLVM (due to Rust's safety guarantees, ergonomics, and verb(algorithm)-focused programming style)
- Make Rust truly "Self-Hosting" without dependency between itself and the metal other than assembly (as opposed to having a hard dependency on C/C++ for compilation to machine code)
- SAFELY parallelize parsing/compilation, AST Optimization (by function/module), etc. - for example: use Rayon to implement work-stealing for all of this
Preliminary Plan
- Create a crate that defines the structs and fundamental trait impls for the IR objects (Modules, Functions, Instrucitons, etc, etc)
- Create a crate to parse LLIR into those structs
- Create a crate to serialize those structs to LLVM Bit-Code and/or back to LLIR
- Ensure can pass any valid LLIR through the parse-serialize cycle and then on to the current LLVM and have it work correctly
- Create crate that defines transformation functions on the in-memory IR structs that can be leveraged by specific optimization steps/pipelines
- Implement some trivial transformations/optimizations in a crate leveraging that
- Ensure the transformed result serializes to LLIR/LLBitCode correctly and can be correctly consumed by the current LLVM
- At this point, hopefully community will step up and begin implementing/porting transformations/optimizations for the LLIR
- Begin work on LLIR to actual assembly for HW architectures in Rust
- ...
- use BindGen to create C/C++ interface to the library (for clients)
- ...
In the plan, it would be a requirement that ALL code is implemented in safe Rust. Anything requiring unsafe should be wrapped in an appropriate safe abstraction that is ideally pushed into libstd or libcore (or at the least a special low-level unsafe abstractions crate). The goal would be to have no direct unsafe code anywhere in the RLLVM library.
Ideally, the code-base would avoid boiler-plate through extensive use of "derive" and/or procedural macros and/or Macros 2.0. It would seek to leverage the latest innovations in Rust, like const generics (when they become available) as well.
The code-base would NOT be a gradual in-place replacement of the C++ code, but, rather an entire re-write in idiomatic, safe, modern Rust that interfaced with existing LLVM stages through serialized LLIR or LLBitCode because, after having examined the C++ code of LLVM (which is really nice and very well organized), it doesn't seem like trying to gradually replace pieces with Rust (like is being done for Emacs for example which is C code) would be at all useful/productive. It seems like there is just too much friction between how things are properly done in Rust and how things are done in C++ for that to be useful.
Benefits of the Plan
- If all the "unsafe" bits are in well-defined safe-abstractions that can be fully unit tested and manually verified for soundness, then, perhaps the whole RLLVM could be "proven" sound using the techniques pioneered here: (www .reddit .com)/r/rust/comments/6m46is/rustbelt_securing_the_foundations_of_the_rust/
- All the LLVM optimizations could be in Rust without first implementing all of the LLVM stack (optimized IR -> Machine Level ASM). This would mean that optimization could be more provably sound (lack of UB)
- Through the use of Macros 2.0, Traits, and ultimately things like Const Generics, could potentially significantly reduce the line count of the code necessary to implement IR optimization passes making it all more maintainable, auditable, and robust.
- Would not require any changes to the front-end Rustc which could continue to generate LLIR and rely on the optimizations and downstream infrastructure of RLLVM as if it were LLVM. RLLVM would pipeline to LLVM for stages that weren't yet implemented in RLLVM until those stages could be implemented (obviously, this would potentially make the overall compilation slower, but, that isn't necessarily the case and would be irrelevant to the overall ultimate goal which would be the whole chain in RLLVM)
- Things like code-coverage instrumentation could be implemented in Rust (more easily) using the RLLVM crates
- Adding new optimization passes to RLLVM would use Rust instead of C++ potentially allowing a larger pool of optimization implementers due to the safety guarantees and ergonomics of Rust (think Ruby/Python/Perl programmers who are not C++ experts who have now learned Rust)
- Other benefits???
What about Cretonne?
There was some feedback on the above referenced Reddit post saying that investing time in Cretonne would be time better spent than time spent on something like RLLVM. After I looked into Cretonne more, I'm not immediately convinced of that fact for the following reasons:
- Cretonne is explicitly NOT intended to be the new back-end for Rust static compilation
- Many of its design and architecture choices seem to be motivated by JIT compiling JS and statically/JIT compiling WebAssembly and it doesn't seem to want to be a general purpose IR -> Machine Assembly static optimizer and compiler (that can be used for any language like LLVM)
- Significant investment in Rust (things like code coverage) currently rely on LLVM being the back-end for Rust. Changing to Cretonne (which is not 100% compatible from and IR or goals perspective) would undermine that investment whereas reimplementing LLVM as RLLVM would get all the benefits of Rust while maintaining all the existing investments in LLVM tooling, hooks, extensions, etc.
My understanding of Cretonne, its goals, and the degree to which it can and wants to become the ultimate replacement for LLVM could be completely off the mark. If that is the case, I'd be interested in any feedback, particularly from the Rust compiler team, the Cretonne team, those involved with Rust code coverage, etc. that helps me understand why investing time in RLLVM would be better spent on Cretonne or otherwise.
Thank you in advance for any input or feedback you can provide. If there is anything you feel I'm completely ignoring, please let me know.
NOTE: Apologies for the HTML links that aren't links, but, as a new user I'm not permitted more than 2