Any way to force the *omission* of frame pointers?

I'm using Rust to write code for an embedded target of mine -- a CPU made on a digital circuit simulator, implementing a subset of Thumb (specifically, thumbv6m-none-eabi).

The CPU is slow (~40kHz) so every wasted instruction adds up to a noticeable slowing down of stuff I run. I understand this is nowhere near anything Rust was designed to run on.

There is something I'm blocking on: all functions, even leaves, have a frame pointer, which I don't need.

For example, here's a small function that implements division using a hardware divider mapped to memory:

#[unsafe(export_name = "__aeabi_uidiv")]
pub extern "C" fn __aeabi_uidiv(a: u32, b: u32) -> u32 {
    let mut res;
    unsafe {
        core::arch::asm!("ldr {res}, [{addr}]",
            addr = in(reg) 0xffff_ff20,
            res = lateout(reg) res,
            in("r0") a,
            in("r1") b,
        )
    }
    res
}

Since the ABI dictates parameters to be passed in sequential registers (r0, r1) and the result returned to r0, I would expect the following code:

__aeabi_uidiv:
    movs r2, #223
    mvns r2, r2 ; simply getting the address of the MMIO port
    ldr r0, [r2] ; r0 and r1 are already populated with the parameters
    bx lr

This is the smallest possible Thumb code for what I'm trying to do.

But here is what I'm getting with opt-level=3, lto="fat", --release:

__aeabi_uidiv:
	.fnstart
	.save	{r7, lr}
	push	{r7, lr}
	.setfp	r7, sp
	add	r7, sp, #0
	movs	r2, #223
	mvns	r2, r2
	@APP
	ldr	r0, [r2]
	@NO_APP
	pop	{r7, pc}

There is no reason for lr to be saved here, since the function is a leaf. But most importantly, it uses r7 as a frame pointer when I would expect no such thing to be done with opt-level=3. That's 2 additional instructions (and since push/pop take up multiple cycles, it's actually more like 6). In relative terms, 50% to 150% more instructions!

There is a codegen parameter to force frame pointers (-C force-frame-pointers), but nothing to force their omission, to my knowledge.

Is there anything I can pass to the compiler to force it to generate the optimal code? Otherwise, I'll have to rewrite all those small functions to assembly. For the example I gave here, that's not really an issue, but I have more complex functions around, such as memcpy4, that end up completely botched if I write them in Rust. Of course, this is the kind of stuff global_asm! is for, but it'd be really nice to write this stuff in Rust.

Thanks!

edit: edited example code to use constant to make point clearer

It's not leaf, though. It calls R0divR1U, which may affect the codegen significantly, depending on what's happening in R0divR1U.address()

There is no call in either assembly code though, so I'm not sure how that makes sense.

My educated guess would be that this is just a const function returning the address of the memory mapped register than corresponds to this hardware division ci-processor (I believe the Pi RP2040 is an example of actual hardware that does something similar). Either way it clearly doesn't end up in the resulting assembly.

It's a const fn that returns an u32. But the frame pointer gets emitted anyway even for an empty function:

#[unsafe(no_mangle)]
pub extern"C" fn foo() -> i32 {
    42
}

gives

foo:
	.fnstart
	.save	{r7, lr}
	push	{r7, lr}
	.setfp	r7, sp
	add	r7, sp, #0
	movs	r0, #42
	pop	{r7, pc}
1 Like

This makes sense because LLVM doesn't always able to remove allocations for temporary objcts on stack even code that needed these allocations are all gone. Like in this example: clang 11 puts data on stack, while clang 21 keeps everything in registers.

Maybe try global asm or naked functions?

That's what I ended up doing for the simple functions, but it doesn't work for more complicated ones with branches and variables. Having them in Rust is nice because the logic stands out better than assembly, but it wastes a lot of cycles on frame pointers.

Maybe try to define a custom target with "frame-pointer": "none"? Unfortunately, it seems Rust uses "frame-pointer": "always" for all thumb targets.

2 Likes

I wonder if using FramePointer::NonLeaf would also work here without regressing debuggability. That option didn't exist yet when frame pointers were force enabled for this target.

Yeah, it may be worth to create a request issue to change the value. It's weird to unconditionally force frame pointers on such low powered targets.

1 Like