Creating a static library from Rust ready for runtime minimizing LTO using a Fortran linker?

Hello! I have a computationally intensive, legacy Fortran-based program. I have replaced one of the most computationally intensive components of this program with a modernized Rust component. I currently use Cargo to spit out a static library .a file, then compile the Fortran code with the legacy Intel Fortran compiler and linker, ifort, linking in the Rust-based static library. This all works fine, but now I would like to get the best optimization I can from the setup. I'm not an expert on optimization, so please correct my thinking if I'm wrong any point here. I suspect the way to get the best runtime optimization would be to leave the Rust static library unoptimized during creation of the library. Then leave it up to the linker to do the cross-language optimization after all the unoptimized object files have been produced. I should note, I have Rust dependencies, so using Cargo is probably necessary.

How do I go about doing this with Rust+Cargo? This is where I'm uncertain about many of the parts. My first attempt would be the following.

I'm producing a static library, so I need

[lib]
crate-type = ["staticlib"]

I believe using

[profile.release]
strip = "symbols"
panic = "abort"
codegen-units = 1

will generally produce better runtime speeds. Though, I suspect codegen-units = 1 might not make a difference if the optimization is not occurring during the compile step, but only during the linker step. I assume the symbols do not help optimization and can be stripped.

Next up, to get the best LTO should I be entirely disabling optimization during compilation and enabling LTO?

[profile.release]
lto = "fat"
opt-level = 0

Is there some form of optimization I should be doing at the compilation level even with LTO? I'm also guessing the lto = "fat" might be unnecessary since it won't be Rust/Cargo doing the linking? Or does enabling this cause optimization at the static library level, and I should actually disable this as well to make sure nothing is optimized before handing off to the Fortran linker?

Then I need to run Cargo with:

RUSTFLAGS="-C linker-plugin-lto -C target-cpu=native" cargo build --release

as this makes sure that all the dependencies are also flagged with LTO compilation and are compiled with CPU native instructions. Does target-cpu=native do anything here since the optimization is happening during linking? Do I need to be passing in other flags to make sure all the above happens to all the dependencies as well?

Finally, this is all assuming that Intel's Fortran compiler and linker, ifort is able to optimize the static library with the Fortran code after it receives it. I believe the static library shouldn't have anything special that prevents this, but I don't know a ton about object files. Just using ifort's -ipo flag I believe should enable the LTO from that end (along with other optimization flags).

I suppose it's also worth noting that the entry point to the Rust library is a pub extern "C" function which is then called from Fortran using Fortran's iso_c_binding. I don't expect this is any issue for the LTO, but again, I'm not an expert in this.

If there are things that I'm clearly misunderstanding or clearly missing I would greatly appreciate hearing about it. Thank you so much for your time!

Ignoring most of the questions, because I can't answer them, but two points:

  1. You want opt-level = 3, not opt-level = 0; LTO is not capable of doing all the optimizations that can be done at compile time, and what you want is for the compiler to do in-function optimizations before LTO takes place, and then for LTO to optimize further once it has the complete code in hand. You also need a correct target-cpu, since the LLVM IR can include target-dependent intrinsics and layout of data structures.
  2. I would recommend, given that you're going via a pub extern "C" function, using cbindgen to create an ISO C header file that you can review with a Fortran/C interop expert if anything goes wrong - you know that if cbindgen generates the header you expect, the Rust code will contain matching symbols for the Fortran compiler to pick up.
3 Likes

Thank you! In particular for the note about opt-level = 3. I was thinking the optimizations during the compilation step might optimize the individual components in a way that might hinder the later LTO. But it appears this was a misunderstanding on my part!

For the target-cpu, I believe this should be covered by the RUSTFLAGS I'm passing to cargo, though I might be mistaken.

Cross language LTO is not going to work for ifort. As far as I know it is not based on LLVM and LTO requires all code to be compiled by the same compiler (framework) (eg LLVM, GCC, MSVC, ...) and ideally the exact same version. Would using intel ifx or flang work for your purposes. Flang is part of LLVM and intel ifx is based on LLVM too. You will have to make sure that the compiler you use for linking uses at least the same version of LLVM as rustc. LLVM refuses to read bitcode produced by a newer LLVM version.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.