Compiling Rust and C in the same llvm codegen unit

Hello, we have a lot of C (clang) and a lot of Rust code. This is embedded, so we need a lot optimizations to fit into flash. Now, Rust is compiled in one codegen unit with lto, C code is compiled separately and optimized, lto separately for rust and C. Then they are linked (we are not doing cross-lang lto).

We have an idea to put C and Rust code on the same llvm codegen unit to reach more optimizations (size primarily).

Does anyone have some resources that can help in that? Maybe someone already done this before?

(entry point is in C)

It is possible, but a bit fiddly to set up, since you need LLVM versions to match. Rust calls that cross-language inlining, or cross-language LTO.

https://doc.rust-lang.org/rustc/linker-plugin-lto.html

It looks like link describes compiling rust with llvm and c code separately and the lto it. While I am asking about emitting llvm ir and then compiling it in the save llvm codegen unit with C code, not only cross-lang lto.

Fat LTO will merge all units together, so this should have the same effect regardless whether you compile Rust and C separately. Even on Rust side, fat LTO makes number of codegen units irrelevant, since it performs cross-unit inlining.

If you want to do the merging of bitcode yourself, I think that's an uncharted territory. You'd still use the same -Clinker-plugin-lto flag to get the bitcode.

You probably misunderstood the question. I'm talking about a single llvm instance compiling both C and Rust code.

Yeah, probably. What do you mean by single instance? You probably don't mean just linking the same dynamic library (that is doable). Do you mean run-time instance, as literally both compilers running in the same process and calling the same C++ LLVM API?

Like, for example, rustc emitting llvm ir and feeding it into clang. Or, both rustc and clang emit llvm ir and then use llvm to compile them both, in one codegen unit

both rustc and clang emit llvm ir and then use llvm to compile them both, in one codegen unit

This is what fat LTO does. Bitcode is an on-disk format of LLVM IR. Fat LTO means that LLVM doesn't compile the code in Rust, and doesn't compile the C code in clang, they just dump the IR on disk. Then the linker with LTO is not really a linker, it's a fully-featured LLVM IR compiler, which takes the bitcode, merges all the IR into single unit, and then compiles it.

5 Likes

Thank you for a great explanation. To clarify: are you referring to this code from the link you provided?

# Compile the Rust staticlib
RUSTFLAGS="-Clinker-plugin-lto" cargo build --release
# Compile the C code with `-flto=thin`
clang -c -O2 -flto=thin -o cmain.o ./cmain.c
# Link everything, making sure that we use an appropriate linker
clang -flto=thin -fuse-ld=lld -L . -l"name-of-your-rust-lib" -o main -O2 ./cmain.o

Almost, but it should be -flto=full. For rustc that's -C lto=fat.

Can you please advise on how to have clang and rust's llvm match in versions? At the moment rustc nightly is using llvm 19.1.3, and rustup doesn't seem to ship matching clang version. Building clang the from source does not seem scalable...

There's no nice built-in solution to this unfortunately. Official Rust distribution uses its own patched LLVM version, doesn't ship any C compiler.

You will have to build clang with Rust's LLVM, or build Rust with your copy of LLVM:

I think Debian is also "unbundling" LLVM from Rust, so Debian's rustc package should use the same copy of LLVM as clang, but on Debian the Rust version will be outdated to the point of being almost useless.

There's also another out-of-the-box solution: you can convert C source code to Rust source code:

and then all of that will be compiled by the Rust compiler together, without needing to involve clang.

(I don't use the packages and have no idea how usable they actually are, but) the testing and unstable trains stay up-to-date, e.g.

There are prebuilt clang/rustc pairs with matching LLVM available here:

https://mirrors.edge.kernel.org/pub/tools/llvm/rust/

These are built for use with the Linux kernel but work elsewhere too, of course.

2 Likes