Debug profile is very slow. What can I do?

I am using Visual Studio Code + C/C++ extension to debug my project. I refactored my project to use Workspace so that sub modules are libraries. When I run my code in release profile it takes 0.1 mins and if I run in debug profile it takes whooping 5 mins. I am using rayon with num_thread(1) for both debug and release profile (I did this for monitoring the performance and easy to use breakpoints). Also, my code has lot of par_iters().

I tried changing my profile like below,

[profile.dev]
opt-level = 3

It is fast but it doesn't stop at breakpoints. I tried level 2 but it optimised lot of variables. Then I tried level 1 and that is still slow.

Is there any cargo option to increase compile time and reduce execution time without optimising out variables?

Also if there is a way to dynamically change the num_thread, that should also be fine I guess

currently I am using this,

rayon::ThreadPoolBuilder::new().num_threads(1).build_global().unwrap();

How about an environment variable?

let num_threads = std::env::var("THREADS")
    .map(|s| s.parse().expect("THREADS is no a number"))
    .unwrap_or(1);

rayon::ThreadPoolBuilder::new().num_threads(num_threads).build_global().unwrap();
  • Try debug-assertions = false and overflow-checks = false.
  • If you rely on some methods getting inlined, try #[inline(always)].
  • Try optimizing dependencies only: Profiles - The Cargo Book
6 Likes

Setting the environment variable RAYON_NUM_THREADS is the built-in way to change the default thread count.

3 Likes

Thanks. And sorry. What I meant is conditionally change the num_thread for a specific par_iter(). Like a function call par_iter().use_threads(1) or something like below,
rayon::num_thread(8);
a.par_iter().do_something();
// some other par_iter()
rayon::num_thread(1);
m.par_iter().do_something();

This way I can have configuration file which can control the thread on different stages of my code.

Oh well… I should have checked the docs first.

Thanks but with Install it doesn't change the global thread count right? Since I am using lot of par_iter(), I would prefer to change the thread count globally. That is, in certain places of my main(), I will change the thread count.

Or if Install() can do that, can you tell me how? Sorry, I am not able to understand from the doc

Thanks @kornel. I was expecting something like this. Unfortunately I am seeing minor improvements only. Any other flags can improve the runtime performance at debug mode. Will changing the below parameters help?

lto 
codegen-units

Debug builds can produce very slow executables. Like 10 to 300 times slower.

See examples discussed in this thread: What can Rust do that Python can't? - #28 by ZiCog

I'm not sure anything much can be done about that.

You can test yourself. I don't expect them to help.

Your best bet is to move slowest parts that you don't need to debug precisely to another crate, and use profile overrides feature to heavily optimize only that crate.

Other than that, there's unfortunately a hard trade-off between precision of debug output and execution speed. Optimizers need to heavily rewrite code to make it fast, and such transformations make code that is executed quite different from the source code, which makes debugging info messy and imprecise.

A desperate option would be to set custom LLVM optimization passes to carefully select only the passes that improve code speed without making debugging much worse. rustc appears to have -C passes option for this, but I couldn't find proper docs for that, so good luck!

2 Likes

Thanks. I am not concerned with debug being slow compared to release but debug is too slow compared to other programming language.

My program is quite big and I am trying to port it from pascal. When pascal is faster in debug and rust is slow (dramatically), it is difficult to work on this

Thanks. I will check it. Also, a bit of googling, I came to know that Rust is quite slow when we break it into multiple crates. In my case I am also breaking into multiple crates by using workspace. One suggestion is to use inline. I will have to test it

Inlining across crates happens either when you mark functions as #[inline] or when you use LTO.

Any code you run in the closure passed to install will run in the given threadpool. No need for globals.

Instead of

set_threads(4);
foo();
bar();

set_threads(whatever_it_was_before);

you write

let pool = rayon::ThreadPoolBuilder::new().num_threads(4).build().unwrap();
pool.install(|| {
    foo();
    bar();
});
1 Like

Thanks a lot. It worked :slight_smile:

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.