-
I have an application that is taking tens of minutes to run (processing hundreds of gigabytes of data).
-
Without changing the algorithm, reimplementing the algorithm, increasing parallelism, are there any flags that can be added to "cargo run --release" that makes it optimized? (i.e. gcc vs "gcc -O3") ?
The first thing I'd suggest is
[target.x86_64-unknown-linux-gnu]
rustflags = ["-Ctarget-cpu=native"]
In ~/.cargo/config (adjust for platform and target as appropriate). The defaults are quite conservative and this will allow the compiler much more scope for optimisation options and faster / newer instructions.
2 Likes
There are a few options you can set in Cargo.toml profiles which may help. Release builds already default to opt-level = 3
, but you could try codegen-units = 1
(monolithic codegen sometimes helps inlining) and/or lto = true
. There are a number of stable options you can read in rustc -Chelp
too, like -Ctarget-cpu=native
might help. You can put rustc options in .cargo/config or just the RUSTFLAGS
environment.
6 Likes