Determining the address of the calling function (in extern C functions)?

Is it possible (maybe using x86-64 inline asm) to determine the address of the caller of a function? Obviously yes by doing a full stack walk, but that is expensive. I just need the address of the immediate caller. (Needless to say I'm doing incredibly low level things if I'm asking about this.)

In the interest of avoiding XY problem: I'm intercepting malloc/free/etc for a heap allocation profiler, and I'm getting recursion. No problem: just use a thread local to detect that. Except that the first time you allocate a thread local it can trigger an allocation (in glibc).

Possible ways to deal with this that I have considered:

  • Implement my own thread locals using some sort of allocation free fixed size concurrent hash map with the thread ID as key. It works, but doesn't scale well to high core counts, and I can't do tombstones (without a lock). I have considered hazard pointers (nope, need per thread data) and RCU (actually gets quite complex and I haven't even started on the implementation, would really prefer not to).
  • Array of thread IDs. 32-bit space, so way too large.
  • Detect if I'm getting called by the problematic tls reallocation function in glibc by storing a pointer to it in a global, then doing a cheap pointer comparison before falling back to doing the normal thread local recursion check. This also avoids cache contention.

I know it is possible via global asm or naked functions, but I would prefer to write as little assembler as possible. So can it be done in normal rust and/or via "normal" inline asm?

(If frame pointers are not omited it seems it should be possible to get the return address relative rbp, but that won't work if the user compiles without frame pointers.)

Without the frame pointer, maybe the following can work (with nightly):

#![feature(link_llvm_intrinsics)]

unsafe extern {
    #[link_name = "llvm.returnaddress"]
    fn return_address(level: i32) -> *const u8;
}

You should be able to call return_address(0) to get the address of the caller. It works without asm!, but maybe this is not exactly the kind of solution you were excepting...

1 Like

It is nightly, I think I would rather take asm over nightly. But thanks, it is good to know that is possible.

If you prefer ASM, the only way I see is to use the frame pointer.

#[naked]
pub unsafe extern "C" fn get_caller() -> *const u8 {
    unsafe {
        asm!(
            "mov rax, [rbp+8]",
            "ret",
            options(noreturn)
        );
    }
}

You can force the compiler to keep the frame pointer with -C force-frame-pointers=yes. In Cargo.toml:

[profile.release]
rustflags = ["-C", "force-frame-pointers=yes"]

[profile.dev]
rustflags = ["-C", "force-frame-pointers=yes"]
1 Like

In that case I think I should go for naked functions / global asm where I can get hold of that without depending on frame pointers.

Maybe you can find interesting the following:

use core::arch::asm;
use core::ptr;

unsafe extern "C" {
    fn _start();
    static _etext: u8;
}

#[inline(always)]
fn text_range() -> (usize, usize) {
    unsafe { (_start as usize, &_etext as *const u8 as usize) }
}

#[inline(always)]
fn valid_text_addr(addr: usize, text_start: usize, text_end: usize) -> bool {
    addr >= text_start && addr < text_end
}

#[inline(always)]
unsafe fn load8(p: usize) -> u8 {
    unsafe { ptr::read(p as *const u8) }
}

#[inline(always)]
unsafe fn looks_like_retaddr(addr: usize, text_start: usize, text_end: usize) -> Option<usize> {
    unsafe {
        if !valid_text_addr(addr, text_start, text_end) {
            return None;
        }

        if addr >= text_start + 5 && load8(addr - 5) == 0xE8 {
            return Some(addr - 5);
        }

        let check_ff = |ff_pos: usize| -> Option<usize> {
            if ff_pos < text_start + 2 || ff_pos + 1 >= text_end {
                return None;
            }
            if load8(ff_pos) != 0xFF {
                return None;
            }
            let modrm = load8(ff_pos + 1);
            let reg_bits = (modrm & 0x38) >> 3;
            if reg_bits == 0b010 || reg_bits == 0b011 {
                return Some(ff_pos);
            }
            None
        };

        if addr >= text_start + 2 {
            if let Some(p) = check_ff(addr - 2) {
                return Some(p);
            }
        }

        if addr >= text_start + 3 {
            let b3 = load8(addr - 3);
            if (0x40..=0x4F).contains(&b3) {
                if let Some(p) = check_ff(addr - 2) {
                    return Some(p - 1);
                }
            }
        }

        None
    }
}

#[inline(never)]
unsafe fn backtrace(max_frames: usize) -> ([*const u8; 64], usize) {
    unsafe {
        let (text_start, text_end) = text_range();

        let mut rsp: *const usize;
        asm!("mov {}, rsp", out(reg) rsp, options(nomem, nostack, preserves_flags));

        let mut out: [*const u8; 64] = [ptr::null(); 64];
        let mut n = 0usize;
        let mut last = 0usize;

        for i in 0..2048 {
            if n >= max_frames {
                break;
            }
            let candidate = *rsp.add(i);

            if candidate == last {
                continue;
            }
            if let Some(call_addr) = looks_like_retaddr(candidate, text_start, text_end) {
                out[n] = call_addr as *const u8;
                last = candidate;
                n += 1;
            }
        }

        (out, n)
    }
}

There are serious security improvements to do, but if you can't find a reliable way to get the last call, you may scan the backtrace for repetitions instead. Because the compiler optimizations or unpredictable, I think it may be a more reliable way to find infinite recursion.

That looks quite scary. But this seems to be a full blown stack walker for a debugger, which needs to do heuristics to deal with corrupt state. While interesting it is not what I'm looking for. When I get back to this tomorrow my plan is to write a simple trampoline naked function, that captures the return address and tail calls my malloc interposer with an additional argument containing that return address.

I can then check if that return address is within the range of where the problematic function is (which I will have already resolved at load time, with the slight complication that it is in ld.so, not in libc.so.)

EDIT: Also your example using _start and _etext would only work within a single binary i think? While I'm dealing with dynamic libraries and dynamic symbols here.

Oh yes, effectively it appears that this example is out of context, almost if you work with multiple binaries. It could be rearranged to handle this constraint, but the checks accomplished by this code may not be enough if the stack may contain data relative to the load of an other executable.

If you want to get only the last call, I'm afraid you'll have to force the frame pointer or to use the nightly feature mentionned above. But if you find an other way, reliable even with unpredictable compiler optimizations, I'm interested in

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.