Can I replicate the compiler's optimizations in debug mode

Ryan1729 · April 6, 2019, 12:41am

I have a GUI program that is fairly snappy when in compiled in release mode but basically unusable in debug mode. I’ve done some profiling and narrowed the slow part down to a particular function. I want to know what the compiler did to make that function fast so I can make the program usable in debug mode. That is, I would like some Rust code that, when compiled in debug mode, will be some reasonable fraction of the speed of the release mode version of the code.

I understand that this may not be possible in general, but I would at least like to know how close I can get in a particular case. I'm also aware that I could enable optimizations in debug mode, or just live with when I actually need to use a debugger, but I'm still curious what the compiler did.

The first thing I can think of to do is comparing the debug and release mode assembly with godbolt.org . Are there other options? Is looking at an intermediate representation of the code like MIR or LLVM IR likely to be helpful here? Is there some compiler flag that will list optimizations performed on the code?

OptimisticPeach · April 6, 2019, 3:16am

As far as I know, optimizations are done mostly at the LLVM IR stage. Perhaps comparing the LLVM IR emitted would work to some extent, but at that point manipulating structures and things gets very abstract, and hard to follow, unless you're used to looking at IR or ASM-like. Oh and beware that rust on debug emits alot of LLVM IR, even if it's stripped to a bare minimum. From what I can find on the web, there isn't a compiler flag you can give rustc to emit its optimizations.
Perhaps asking for some help optimizing it might work? Or if you want to get your hands dirty, you can examine the asm to directly embed it into the resulting executable but that both seems sketchy and unwieldy. A simple generic type change can cause a lot of fuss on the asm/ir level

gbutler69 · April 6, 2019, 1:40pm

What's the function? Perhaps just posting the code will allow some suggestions.

Ryan1729 · April 6, 2019, 6:20pm

The function I had tracked the issue down to, called into code from a crate from crates.io. In order to understand what the compiler was doing I pulled my function and enough of the code from the library into godbolt, for it to work. The code paths I was actually using called some functions with constant values and unwrapped Options and Results so I was able to cut away some unneeded code branches and record keeping.

I then pulled this version of the code into my source tree, complying with the licenses and so forth, and gave it a try. This maybe helped a little, but it doesn't make it anywhere close to as fast as it is in release mode. On the plus side, with the source where it is now, I can easily insert some profiling code to further narrow down the performance issue if I wish. I'm partially motivated by curiosity here add digging into the details here has been kinda fun.

What I'd like to be able to do is compile the particular crate containing the slow code in release mode. There's a cargo feature for doing that but it's not quite stabilized yet so I think I'll hold off on doing that. I don't expect the code I've pulled in, to need to change any time soon so I might give @OptimisticPeach's suggestion of directly embedding the optimized assembly a try. The most I've done before is read a little bit of assembly, so that could be another fun adventure.

Ryan1729 · April 6, 2019, 6:42pm

As mentioned above the code that is taking the time is from a library crate. The particular crate is unicode-segmentation, and the specific feature I need is iterating over graphemes.

Most of the work seems to be using large look-up tables and pattern matching. It's not immediately clear to me that there's an easy way to speed that up by hand. And the large look-up tables make it a bit awkward to share here.

Ryan1729 · April 13, 2019, 5:32am

I'm not confident anyone cares, but due to not putting profiling annotations in some parts of my code, I had overestimated the perf impact unicode-segmentation was having. It still shows up while profiling, but currently over 80% of the time is spent elsewhere and this is was likely true before.

system · July 12, 2019, 5:32am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Noob: Why is the performance of release build so much better? help	13	5024	March 18, 2021
Could a stage 1 release compiler build a stage 2 debug compiler?	5	694	January 12, 2023
How are LLVM-IR passes done in Rust? help	4	1232	February 1, 2022
Possible Rust-specific optimizations	5	793	November 13, 2022
Add debug to LLVM ir main function help	4	478	October 27, 2020

Can I replicate the compiler's optimizations in debug mode

Related Topics