Ret instruction jumps to wrong address (stack is in invalid state)


#1

I’ve got a setup where I have a thread-local mpsc::Sender that I’m using to communicate between two worker threads, it looks a little like this:

thread_local! {
    // TODO: We don't want this to be completely public, only pub(crate), but `thread_local`
    // doesn't support pub(crate) syntax.
    pub static RENDER_MESSAGE_CHANNEL: UnsafeCell<Sender<RenderResourceMessage>> = unsafe { UnsafeCell::new(mem::uninitialized()) };
}

...

RENDER_MESSAGE_CHANNEL.with(move |channel| {
    let channel = unsafe { &*channel.get() };
    let message = RenderResourceMessage::Material(material_id, material_source);

    let result = channel.send(message); // <--- Returns to wrong address D:

    result.expect("Unable to send material data to renderer");
});

On the highlighted line I’m running into a very strange issue: send() is returning to the wrong address in memory. I stepped through send() and tracked every change to the esp register and found that by the end esp is no longer pointing to the return address, which is why send() jumps into invalid memory on return. What I don’t understand is why this is happening. Below is my transcript of what send() is doing:

call:         0x066bf554 ;; The address of the return address.

;; Top of Sender::send()

push ebp:     0x066bf550
push ebx:     0x066bf54c
push edi:     0x066bf548
push esi:     0x066bf544
sub esp, 75C: 0x066bede8 ;; Allocate 1884 bytes on the stack.

;; A bunch of code that doesn't touch esp

call:         0x066bede4
pop eax:      0x066bede8

;; A bunch of code that doesn't touch esp

add esp, 0C:  0x066bedf4 ;; Deallocate 12 bytes from the stack.
pop ebp:      0x066bedf8

Offset: 0x75C (dun dun duuunnnnnnnn)
Looks like we're missing an instruction `add esp, 75C`.

From what I can tell the function allocates 0x75C bytes of memory on the stack that it never frees, causing ret to pop an invalid return address from the stack.

Can anyone shed some light as to what’s going on here? The only time I’ve ever seen something like this is when I used the wrong calling convention when doing FFI, but in this case I’m calling a Rust function from the standard library so that doesn’t seem applicable.

Thank you!


#2

The plot thickens! I’m trying to call channel.send() from two different places, and one of the calls actually works. So I factored out that call into a helper function to try to isolate the difference:

fn send_render_message(message: RenderResourceMessage) {
    RENDER_MESSAGE_CHANNEL.with(move |channel| {
        let channel = unsafe { &*channel.get() };
        channel
            .send(message)
            .expect("Unable to send render resource message");
    });
}

RenderResourceMessage is declared as follows:

#[derive(Debug)]
pub enum RenderResourceMessage {
    Mesh(MeshId, ::polygon::geometry::mesh::Mesh),
    Material(MaterialId, ::polygon::material::MaterialSource),
}

When do send_render_message(RenderResourceMessage::Mesh(_, _)) it works (I can step through the disassembly for send() and see it take a different path that includes add esp, 75C before the ret instruction), but when I do send_render_message(RenderResourceMessage::Material(_, _)) I run into the error described above. This seems odd to me because it looks like send() is going through different code paths depending on what value I pass in, even though it’s generic and shouldn’t know anything about the type being sent.


#3

pub static RENDER_MESSAGE_CHANNEL: UnsafeCell<Sender<RenderResourceMessage>> = unsafe { UnsafeCell::new(mem::uninitialized()) };

Why are you initializing with mem::uninitialized()?

Are you properly initializing it somewhere else in the code? If not, a crash is unsurprising…

When you do that, make sure to use ptr::write(channel, X) rather than directly setting *channel = X so as not to call the destructor on the old, garbage data.

If I were you, I would avoid both uninitialized() and UnsafeCell - in most cases, the overhead of using RefCell<Option<Sender<...>>> (or RefCell<Sender<...>> with a sensible default value) and avoiding unsafe code should be negligible.


#4

Here’s the code I’m using to initialize RENDER_MESSAGE_CHANNEL:

let sender = sender.clone();
::std::thread::spawn(move || {
    // Initialize thread-local renderer message channel.
    resource::RENDER_MESSAGE_CHANNEL.with(move |channel| unsafe {
        ptr::write(channel.get(), sender);
    });

    // Initialize worker thread to support fibers and wait for work to be available.
    fiber::init();
    scheduler::wait_for_work();
});

I am indeed using ptr::write() to avoid dropping the unused memory. I’ll try a version using RefCell<Option<_>> to see if that’s causing the problem.


#5

Welp! Once I switched to RefCell<Option<Sender<T>>> I started getting a panic because the sender was None. Turns out I was forgetting to set the sender for the thread that was in charge of spawning the other worker threads. Moral of the story: Don’t try to do “clever” optimizations with unsafe code.

Thanks for the help!