Hello! I have a computationally intensive, legacy Fortran-based program. I have replaced one of the most computationally intensive components of this program with a modernized Rust component. I currently use Cargo to spit out a static library .a
file, then compile the Fortran code with the legacy Intel Fortran compiler and linker, ifort
, linking in the Rust-based static library. This all works fine, but now I would like to get the best optimization I can from the setup. I'm not an expert on optimization, so please correct my thinking if I'm wrong any point here. I suspect the way to get the best runtime optimization would be to leave the Rust static library unoptimized during creation of the library. Then leave it up to the linker to do the cross-language optimization after all the unoptimized object files have been produced. I should note, I have Rust dependencies, so using Cargo is probably necessary.
How do I go about doing this with Rust+Cargo? This is where I'm uncertain about many of the parts. My first attempt would be the following.
I'm producing a static library, so I need
[lib]
crate-type = ["staticlib"]
I believe using
[profile.release]
strip = "symbols"
panic = "abort"
codegen-units = 1
will generally produce better runtime speeds. Though, I suspect codegen-units = 1
might not make a difference if the optimization is not occurring during the compile step, but only during the linker step. I assume the symbols do not help optimization and can be stripped.
Next up, to get the best LTO should I be entirely disabling optimization during compilation and enabling LTO?
[profile.release]
lto = "fat"
opt-level = 0
Is there some form of optimization I should be doing at the compilation level even with LTO? I'm also guessing the lto = "fat"
might be unnecessary since it won't be Rust/Cargo doing the linking? Or does enabling this cause optimization at the static library level, and I should actually disable this as well to make sure nothing is optimized before handing off to the Fortran linker?
Then I need to run Cargo with:
RUSTFLAGS="-C linker-plugin-lto -C target-cpu=native" cargo build --release
as this makes sure that all the dependencies are also flagged with LTO compilation and are compiled with CPU native instructions. Does target-cpu=native
do anything here since the optimization is happening during linking? Do I need to be passing in other flags to make sure all the above happens to all the dependencies as well?
Finally, this is all assuming that Intel's Fortran compiler and linker, ifort
is able to optimize the static library with the Fortran code after it receives it. I believe the static library shouldn't have anything special that prevents this, but I don't know a ton about object files. Just using ifort
's -ipo
flag I believe should enable the LTO from that end (along with other optimization flags).
I suppose it's also worth noting that the entry point to the Rust library is a pub extern "C"
function which is then called from Fortran using Fortran's iso_c_binding
. I don't expect this is any issue for the LTO, but again, I'm not an expert in this.
If there are things that I'm clearly misunderstanding or clearly missing I would greatly appreciate hearing about it. Thank you so much for your time!