Debug profile is very slow. What can I do?

selvavm · July 31, 2020, 8:31am

I am using Visual Studio Code + C/C++ extension to debug my project. I refactored my project to use Workspace so that sub modules are libraries. When I run my code in release profile it takes 0.1 mins and if I run in debug profile it takes whooping 5 mins. I am using rayon with num_thread(1) for both debug and release profile (I did this for monitoring the performance and easy to use breakpoints). Also, my code has lot of par_iters().

I tried changing my profile like below,

[profile.dev]
opt-level = 3

It is fast but it doesn't stop at breakpoints. I tried level 2 but it optimised lot of variables. Then I tried level 1 and that is still slow.

Is there any cargo option to increase compile time and reduce execution time without optimising out variables?

Also if there is a way to dynamically change the num_thread, that should also be fine I guess

currently I am using this,

rayon::ThreadPoolBuilder::new().num_threads(1).build_global().unwrap();

s3bk · July 31, 2020, 11:36am

How about an environment variable?

let num_threads = std::env::var("THREADS")
    .map(|s| s.parse().expect("THREADS is no a number"))
    .unwrap_or(1);

rayon::ThreadPoolBuilder::new().num_threads(num_threads).build_global().unwrap();

kornel · July 31, 2020, 11:41am

Try debug-assertions = false and overflow-checks = false.
If you rely on some methods getting inlined, try #[inline(always)].
Try optimizing dependencies only: Profiles - The Cargo Book

cuviper · July 31, 2020, 2:30pm

Setting the environment variable RAYON_NUM_THREADS is the built-in way to change the default thread count.

selvavm · August 1, 2020, 11:41am

Thanks. And sorry. What I meant is conditionally change the num_thread for a specific par_iter(). Like a function call par_iter().use_threads(1) or something like below,
rayon::num_thread(8);
a.par_iter().do_something();
// some other par_iter()
rayon::num_thread(1);
m.par_iter().do_something();

This way I can have configuration file which can control the thread on different stages of my code.

s3bk · August 1, 2020, 11:44am

Oh well… I should have checked the docs first.

s3bk · August 1, 2020, 11:52am

selvavm · August 1, 2020, 4:29pm

Thanks but with Install it doesn't change the global thread count right? Since I am using lot of par_iter(), I would prefer to change the thread count globally. That is, in certain places of my main(), I will change the thread count.

Or if Install() can do that, can you tell me how? Sorry, I am not able to understand from the doc

selvavm · August 1, 2020, 4:46pm

Thanks @kornel. I was expecting something like this. Unfortunately I am seeing minor improvements only. Any other flags can improve the runtime performance at debug mode. Will changing the below parameters help?

lto 
codegen-units

ZiCog · August 1, 2020, 5:52pm

Debug builds can produce very slow executables. Like 10 to 300 times slower.

See examples discussed in this thread: What can Rust do that Python can't? - #28 by ZiCog

I'm not sure anything much can be done about that.

kornel · August 1, 2020, 6:09pm

You can test yourself. I don't expect them to help.

Your best bet is to move slowest parts that you don't need to debug precisely to another crate, and use profile overrides feature to heavily optimize only that crate.

Other than that, there's unfortunately a hard trade-off between precision of debug output and execution speed. Optimizers need to heavily rewrite code to make it fast, and such transformations make code that is executed quite different from the source code, which makes debugging info messy and imprecise.

A desperate option would be to set custom LLVM optimization passes to carefully select only the passes that improve code speed without making debugging much worse. rustc appears to have -C passes option for this, but I couldn't find proper docs for that, so good luck!

selvavm · August 1, 2020, 7:27pm

Thanks. I am not concerned with debug being slow compared to release but debug is too slow compared to other programming language.

My program is quite big and I am trying to port it from pascal. When pascal is faster in debug and rust is slow (dramatically), it is difficult to work on this

selvavm · August 1, 2020, 7:29pm

Thanks. I will check it. Also, a bit of googling, I came to know that Rust is quite slow when we break it into multiple crates. In my case I am also breaking into multiple crates by using workspace. One suggestion is to use inline. I will have to test it

kornel · August 1, 2020, 7:46pm

Inlining across crates happens either when you mark functions as #[inline] or when you use LTO.

s3bk · August 2, 2020, 12:07pm

Any code you run in the closure passed to install will run in the given threadpool. No need for globals.

Instead of

set_threads(4);
foo();
bar();

set_threads(whatever_it_was_before);

you write

let pool = rayon::ThreadPoolBuilder::new().num_threads(4).build().unwrap();
pool.install(|| {
    foo();
    bar();
});

selvavm · August 2, 2020, 4:50pm

Thanks a lot. It worked

system · October 31, 2020, 4:50pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Debug build performance help	3	651	May 15, 2021
[Solved] Rayon only running 24 threads; CPU only at 5%	8	1780	January 12, 2023
Release code slower than debug when using rayon help	5	201	June 12, 2025
Optimizations in Rayon help	6	153	February 22, 2025
Workflow for profiling Rayon help	6	467	March 13, 2024

Debug profile is very slow. What can I do?

Related topics