How to tell rust compiler not to optimize a block of code?

following block is a delay code. when compiled with opt-level=1, it optimizes and throws the code out.
When compiled with opt-level=0, its kept but with a lot other added code.

Is there a way to prevent compiler optimization for a block of code?

let mut delay: u32 = 0xffffff;
        while delay > 0 {
            delay -=  1
        }

You might be able to get the effect you want by calling std::thread::yield_now or std::hint::black_box¹ inside the loop.

¹ Requires nightly compiler

1 Like

Its built for risc-v bare metal target. So can't use std library.

Digging around in the docs, there’s also core::hint::spin_loop(), but I don’t know whether or not it will help.

1 Like

You can't disable optimizations on a per-block basis. Some optimizations cannot be disabled at all.

If you are running on a bare metal target, where std::time::sleep is unavailable, you should code your loop directly in inline assembly. That is the only way to enforce specific low-level behaviour of code which doesn't exist at the language level. Asm blocks in Rust are considered entirely opaque to the optimizer, even if they are empty.

12 Likes

volatile is the usual choice in C. I don't know if that can be applied to a simple variable like delay. But, it's easy enough to try.

"volatile" is actually very poorly specified in C. The common recommendation is "volatile is only for memory-mapped I/O", because the only thing that it clearly guarantees is that accesses will not be added or removed. It's very hard to say anything else, particularly in the important case of mixing volatile and non-volatile accesses.

Rust inherits those problems. In particular, volatile isn't a way to disable optimizations, and doesn't guarantee any specific assembly output. Now that we have stable inline assembly, if your use case fits the asm mold, you should just use it. You'll save yourself from the trouble of underdefined semantics, weird behaviour on different architectures, unexpected codegen and changes in the future compilers.

2 Likes

For the record, slapping a read_volatile in the while condition does seem to work for me in 1.65.0-nightly. I agree with afetisov that it's not really the semantically correct choice though.

If you're only trying to delay for a fixed period, you may want to look in to getting any timer hardware you have access to configured so you don't have to busy wait. Bare metal projects tend to end up needing a timer eventually so it may be worth the effort to get that set up.

2 Likes

std::hint::black_box is actually core::instrinsics::black_box

3 Likes

The best way to implement a delay loop is something like this:

fn delay(ticks: u32) {
  for _ in 0..ticks {
    unsafe { asm!("nop"); }
  }
}

That way it's obvious to all readers that you are wanting to do nothing for a number of ticks, and the optimiser won't be tempted to turn while delay > 0 { delay -= 1 } into a let delay = 0 (which is perfectly valid in both C and Rust because -= has no observable side-effects).

17 Likes

It's not clear to me why a smart optimizer, at the instruction level, could not reason that a "nop" is a "do nothing" and doing nothing ticks number of times is the same as not doing the "do nothing" at all. And hence decide it's valid to remove the loop entirely.

The compiler is not allowed to inspect the contents of an asm!() block according to the specification of asm!() or modify it. There is the pure flag to allow optimizing away the asm!() block as a whole if it's outputs aren't used, but if pure isn't used, asm!() blocks can't be optimized away as they are assumed to have arbitrary side effects (within the bounds of what rust code would be allowed to do).

5 Likes

Yeah, that's a fundamental problem with these sorts of delay loops and like @bjorn3 mentioned, we "fix" the problem by essentially define it away.

In practice, the best way to do a delay on embedded devices is still to set up a timer and emit some sort of wfi instruction that will put the processor to sleep until that timer elapses. That's not always possible though[1], which is why you see these delay loops crop up in embedded C fairly regularly.


  1. sometimes you're so early in the boot process that timers haven't been set up yet, or maybe you might already be using all your device's timers and just have nothing left to attach the desired delay to.

    It could also be that the developer is being lazy and do { delay -= 1; } while (delay > 0) loop seems "good enough"... I used to work at a company where the lead developer for our CNC motion controller software insisted on publishing debug binaries because "the compiler breaks things" :man_facepalming: ↩︎

Ah ha, good to know. Thanks.

I have written my fair share of such delay loops, in all kind of different languages. It always felt dirty. Often there was no other way.

It guarantees that there will be at least delay mov instructions executed, doesn't it? Which is the intent. There might be additional overhead, of course, but it guarantees that at least the mov instructions will be there.

1 Like

If volatile memory access is used, the compiler must emit the specified number of accesses (and in the order you wrote on the page). This can be used to cause a delay effect, but that's not really intended. Kinda does work though, I guess.