(ARM) Extra branch and link instruction in naked function

I am working on porting some code used on a STM32F4 based device from C to Rust. I am puzzled why the Rust equivalent is inserting an extra bx lr compared to the C version. Following is an excerpt from the C code:

void startup(void) __attribute__((naked)) __attribute__((section(".start_section")));

void startup(void) {
  asm volatile(" NOP\n"
	       " LDR SP,=0x2001C000\n"
	       " BL main\n"
	       ".L1: B .L1\n"
	       );
}

int main(void) {
  while(1)
    ;
  
  return 0;
}

When inspecting the generated elf-file with arm-none-eabi-objdump -d I get

20000000 <startup>:
20000000:	bf00      	nop
20000002:	f8df d008 	ldr.w	sp, [pc, #8]	; 2000000c <startup+0xc>
20000006:	f000 f803 	bl	20000010 <main>
2000000a:	e7fe      	b.n	2000000a <startup+0xa>
2000000c:	2001c000 	.word	0x2001c000

20000010 <main>:
20000010:	e7fe      	b.n	20000010 <main>
20000012:	bf00      	nop

For the Rust version, this is what I do:

#![feature(asm, naked_functions, lang_items, used)]
#![no_std]
#![no_main]

#[naked]
#[used]
#[no_mangle]
#[link_section = ".start_section"]
pub fn startup() {
    unsafe {
        asm!("NOP
	      LDR SP,=0x2001C000
	      BL  main
	      .L1: B   .L1"
             : : : : "volatile"
        );
    }
}

#[inline(never)] 
#[no_mangle]
pub extern fn main() {
    loop {}
}

#[lang = "eh_personality"] extern fn eh_personality() {}
#[lang = "panic_fmt"] fn panic_fmt() -> ! { loop {} }

The dump of this follows:

20000000 <startup>:
20000000:	bf00      	nop
20000002:	f8df d00c 	ldr.w	sp, [pc, #12]	; 20000010 <startup+0x10>
20000006:	f000 f805 	bl	20000014 <main>
2000000a:	e7fe      	b.n	2000000a <startup+0xa>
2000000c:	4770      	bx	lr
2000000e:	bf00      	nop
20000010:	2001c000 	.word	0x2001c000

20000014 <main>:
20000014:	e7fe      	b.n	20000014 <main>
	...

This is not a big deal, as bx lr on memory location 2000000c is never reached, due to the branch to main preceding it. I still would like to understand the reason for this difference.

I suspect the difference isn't the language but rather the compiler, were you using GCC or Clang for the C code? Regardless, the solution is probably to add mem::unreachable at the end of your function.

That's right @parched, of course I should mention which compiler I was using. It's gcc 7.2.1.

Adding core::mem::unreachable() after the assembler block removes the branch and link instruction, but instead inserts an permanently undefined instruction:

2000000a:	e7fe      	b.n	2000000a <startup+0xa>
2000000c:	defe      	udf	#254	; 0xfe
2000000e:	bf00      	nop
20000010:	2001c000 	.word	0x2001c000

As said, this is no big concern, I'm just interested in understanding why Rust/LLVM adds this extra instruction while C/gcc don't. Somehow gcc understands that this function is never called and can optimize away the branch and exchange instruction, right?

What happens if you remove the #[inline(never)]?

This seems to be a real difference between Rust and Clang. Rust always inserts a "return;" at the end of a naked function, whereas Clang does not. Clang always inserts the "unreachable" instruction at the end, Rust does not. I don't think much thought was given to this behavior when implementing naked functions, if you have a particular opinion one way or another, you should voice it in the naked functions tracking issue

Removing #[inline(never)] doesn't alter this behaviour. I guess that's reasonable..?

Are you compiling in debug or release mode?

I can imagine that in debug mode, Rust tries to trigger a debugger or something, maybe that's the reason for udf #254.

Interesting! It didn't come to my mind before to compile the c file in Clang. Tried it now, with the following result:

20000000 <startup>:
20000000:	bf00      	nop
20000002:	f8df d008 	ldr.w	sp, [pc, #8]	; 2000000c <startup+0xc>
20000006:	f000 f803 	bl	20000010 <main>
2000000a:	e7fe      	b.n	2000000a <startup+0xa>
2000000c:	2001c000 	.word	0x2001c000

It's identical to the file produced by gcc, at least for this excerpt (the main function is different between gcc/Clang, but that's another story). So at least in this case, Clang does not insert the udf instruction which appeared after the adding of unreachable().

As far as I understand naked functions, this seems very strange. Is not the purpose of these to skip all prologue/epilogue? I would guess this included returning. I guess I will voice it in the tracking issue, thanks to pointing me there.

I am compiling in release mode, I'm still in a early phase of my Rust coding for bare metal and want to learn what it really compiles down to, as stripped down as possible.