Release build works but Debug build fails

Hi

I'm new to embedded systems programming and rust. My first project involves assembling a minimal bootloader for an nRF52840 board.

The code for my bootloader (borrowed from @adamgreig 's ethernet bootloader for stm32s) is as follows:

#![no_std]
#![no_main]

extern crate cortex_m_rt;
extern crate cortex_m;
extern crate cortex_m_semihosting;
extern crate panic_halt;
extern crate nrf52840_hal;

use core;
use cortex_m_rt::{entry, exception};

static mut JUMP: Option<extern "C" fn()> = None;

pub fn boot_from(scb: &mut cortex_m::peripheral::SCB, address: u32){
    unsafe {
        let stack_pointer = *(address as *const u32);
        let reset_vector  = *((address + 4) as *const u32);

        cortex_m::asm::dsb();
        cortex_m::asm::isb();
        JUMP = Some(core::mem::transmute(reset_vector));
        scb.vtor.write(address);
        cortex_m::register::msp::write(stack_pointer);  // debug builds fails here.
        (JUMP.unwrap())();
    }
}

#[entry]
fn main() -> ! {
    let mut core_peripherals = nrf52840_hal::target::CorePeripherals::take().unwrap() ;
    //let mut core = cortex_m::Periphials::take().unwrap();
    boot_from(&mut core_peripherals.SCB, 0x4000);
    loop
    {
    }
}

#[exception]
fn HardFault(ef: &cortex_m_rt::ExceptionFrame) -> ! {
    panic!("HardFault at {:#?}", ef);
}

#[exception]
fn DefaultHandler(irqn: i16) {
    panic!("Unhandled exception (IRQn = {})", irqn);
}

Issue:

  1. I have a tiny blinky app located at the 0x4000 address i.e. the JUMP vector used by the bootloader.
  2. When I compile a release build and load it onto the board, it just works. Double checked the optimized assembly code and it all makes sense.
  3. However, when I compile a debug build and load, it fails. Upon analyzing the generated assembly code, I believe the problem manifests itself in the call to msp::write(stack_pointer) function.
    • This functions in-turn invokes another function or an FFI named __msp_w(). The call graph looks like this
      boot_from() ---> msp::write(stack_pointer) ----> __msp_w()

When msp::write(stack_pointer) gets called from boot_from(), it pushes the address of the next instruction (or value of the link register) onto the stack but later when __msp_w() gets called from msp::write(stack_pointer), it updates the stack_pointer (to 0x20040000 - start_of_stack for blinky) and returns. So, the extra overheard of a function call leads to corruption of the stack when exiting msp::write(stack_pointer). Here's the assembly for it.

Dump of assembler code for function cortex_m::register::msp::write:
   0x0000065c <+0>:	    push	{r7, lr}            // lr pushed on to the stack
   0x0000065e <+2>:	    mov	r7, sp
   0x00000660 <+4>:	    sub	sp, #8
   0x00000662 <+6>:	    str	r0, [sp, #4]
   0x00000664 <+8>:	    bl	0x68a <__msp_w>        // msp updated
   0x00000668 <+12>:	b.n	0x66a <cortex_m::register::msp::write+14>
   0x0000066a <+14>:	add	sp, #8
   0x0000066c <+16>:	pop	{r7, pc}               // lr being popped off from a different stack address
End of assembler dump.

Note:

  1. When we enable the feature inline-asm and build with rust-nightly, we are able to bypass the above issue in the debug build.
  2. However oddly enough it hangs right after exiting msp::write(stack_pointer) when unwrapping the JUMP vector. Here's the assembly for unwrap() -
|   0x31c <core::option::Option<T>::unwrap>:         push {r7, lr}
|   0x31e <core::option::Option<T>::unwrap+2>:       mov r7, sp
|   0x320 <core::option::Option<T>::unwrap+4>:       sub sp, #16 
|   0x322 <core::option::Option<T>::unwrap+6>:       str r0, [sp, #8] 
|   0x324 <core::option::Option<T>::unwrap+8>:       ldr r0, [sp, #8] 
|   0x326 <core::option::Option<T>::unwrap+10>:      cmp r0, #0 

and the contents of the stack_pointer after the push at 0x31c (below). Its odd that the 2nd word pointed to the stack pointer is '0' when it should be '0x193' i.e. the value of link register.

stepi

{"token":164,"outOfBandRecord":[],"resultRecords":{"resultClass":"running","results":[]}}

0x0000031e 385 in /rustc/a74d1862d4d87a56244958416fd05976c58ca1a8/src/libcore/option.rs

x/2w $sp

Thread Warning: repl: eval. expression 'x/2w $sp' with no thread context. Using default

0x2003fffc: 0x2003ffd0 0x00000000

x/w $lr

Thread Warning: repl: eval. expression 'x/w $lr' with no thread context. Using default

0x193 <test_boot::boot_from+146>: 0x00e7ff90

Questions:

  1. In this case, why does a debug build generate semantically different assembly vis-a-vis its release build i.e. a release build is optimized for speed and space and the code it generates may look different from that of a debug build but why does the entire execution flow get corrupted in this case.
  2. Why does the stack get corrupted in the second case i.e. after enabling the inline-asm feature.
  3. Is there a problem with the version of the compiler I'm using? If it helps, I've included details of the the nightly and stable version of rust installed on my machine.
PS C:\Users\Nil\devspace\rust\projects\test-boot> rustc  --version
rustc 1.45.0-nightly (a74d1862d 2020-05-14)
PS C:\Users\Nil\devspace\rust\projects\test-boot> rustc  --version     
rustc 1.43.1 (8d69840ab 2020-05-04)

You're more likely to get a satisfying answer by posting this issue on https://github.com/rust-embedded/cortex-m or asking on the rust embedded matrix channel https://matrix.to/#/#rust-embedded:matrix.org

The main developers of rust embedded rarely answer on this forum - but they are very active on the matrix channel.

Still, it would be great if you could post the answer here so that others can benefit from it.

1 Like

Hi! As @trembel mentioned you might be better off posting in the Embedded Rust matrix chat, but I can briefly answer some of your questions:

The cortex-m crate provides intrinsics like MSR which are not (yet) in core::arch, but since inline assembly is not yet on stable, it provides a precompiled binary object file which contains those intrinsics as function calls and is linked in to your final build at link time. For a normal debug release, the linker doesn't try to inline those function calls, and optimisations that might remove stack usage are not enabled. That means that the function call may modify the stack pointer and then try to use the stack, which obviously causes immediate issues. With the inline_asm feature, the intrinsics are emitted directly in your function, so there's no extra function call. However, since you then call unwrap() after changing the stack pointer, which is also a function call, you can run into the same problem.

I recommend making JUMP a non-option local variable so you don't need to unwrap it at all:

        // Get new stack pointer and jump address
        let sp = core::ptr::read_volatile(0 as *const u32);
        let rv = core::ptr::read_volatile(4 as *const u32);
        let bootloader: extern "C" fn() = core::mem::transmute(rv);

        // Write new stack pointer to MSP and call into bootloader
        cortex_m::register::msp::write(sp);
        bootloader();

You might also try enabling LTO in your debug builds:

[profile.debug]
codegen-units = 1
incremental = false
debug = true
lto = true

But overall, I suggest just using release builds for embedded development, as generally debug builds are very bloated and often won't even fit in flash, or cause timing issues which waste debugging time. It's not ideal as debugging is then much more annoying of course.

You're not the first person to run into trouble trying to jump to a bootloader like this, hopefully the new inline assembly coming soon will make it a bit easier.

There's no issue with your compiler versions.

1 Like

Thanks - @adamgreig. That clears it up. I'll take your advice and stick to release builds (that way I can keep sane). I suspected the debug build didn't include optimizations to remove stack usage.

For the benefit of everyone, a summary of this case:

  1. Debug build (compiled with stable-rust) selects match 2 of the write function (below). This invokes the MSP write intrinsic which doesnt include optimizations to remove stack usage, leading to a corrupted stack.
  2. Debug build (compiled with nightly-rust) selects match 1 which uses inline assembly that gets added (or in-lined) to the write function. So, we don't see the same problem here.
  3. Release build (compiled with stable rust) is well optimized (i.e. doesn't mess with the stack here). So, it works as expected.

Cortex-m crate's write-to-msp (or cortex_m::register::msp::write()) implementation:

pub unsafe fn write(_bits: u32) {
    match () {
        #[cfg(all(cortex_m, feature = "inline-asm"))]
        () => asm!("msr MSP,$0" :: "r"(_bits) :: "volatile"),

        #[cfg(all(cortex_m, not(feature = "inline-asm")))]
        () => {
            extern "C" {
                fn __msp_w(_: u32);
            }

            __msp_w(_bits);
        }

        #[cfg(not(cortex_m))]
        () => unimplemented!(),
    }
}

Thanks for the tip @trembel. Appreciate it.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.