JIT/stack protection woes

Hey all,

I've been playing with writing a JIT in Rust, and got some basic stuff working great. But as soon as I started writing code that called back into Rust (or out to funcs from any libs, like printf) I noticed things like the order in which I called functions determined whether they worked or not etc. All of my tests were consistent at runtime, but it seemed the assembler I was dumping myself would either work or not work depending on which functions I was calling, in which order, whether or not those functions would call back into Rust, etc.

Before discussing details, let me point out that I'm producing x86 code and using rust stable-i686-apple-darwin, and most of the tests I'm doing that call back into Rust have been against a dev build, although if none of my code or the Rust code it calls into touches any other libs (or uses println! etc) it works in release as well. I'd also like to point out that I have a decent grasp on calling conventions, but not necessarily stack protection etc, especially for a given compiler :slight_smile:

Anyways, after tons of testing and digging, it seems there's some rule for how much extra space is allocated on the stack if a function will call another function (often times more than enough for the callee's arguments, even if the caller doesn't use any of it for locals). For example, when calling a function with 0, 1, or 2 32-bit args in Rust, the caller always allocates 8 bytes on the stack, and only uses the lower bytes for passing the for arg's (if applicable).

For example, here's the disassembly for a function that takes no arguments and returns nothing, but calls a function (which also takes no args and returns nothing):

             __ZN6jitter14call_the_funcs17hac65e222e5e34276E:        // jitter::call_the_funcs::hac65e222e5e34276
000018f0         push       ebp                                                 ; XREF=__ZN6jitter4main17h87c20a18a7dcd27eE+6
000018f1         mov        ebp, esp
000018f3         sub        esp, 0x8 ; <-- extra space here

000018f6         call       __ZN6jitter10do_nothing17ha60e3459d4ed3769E         ; jitter::do_nothing::ha60e3459d4ed3769 (has no arguments)
000018fb         add        esp, 0x8

000018fe         pop        ebp
000018ff         ret        
                        ; endp

This is consistent, until I call a function with 3 32-bit args, where suddenly it's allocating 24 bytes, but only using the lower 12 for passing arg's (I was expecting 16).

Now, if I simply mimic what the rust compiler does in the code I'm producing, no problems whatsoever. That seems to work perfectly. But if I start changing how much space was reserved (for example, I tried replacing 8 with 4 and 12 in the first example) it goes nutty again, and I get segfaults. Also probably worth mentioning is that this is entirely dependent on whether or not a given function calls another function; functions that just do work internally don't seem to follow any of these rules and only allocate space for their locals (if necessary).

So, my questions are:

  1. Is this stack protection or something similar?
  2. What compiler options/flags would affect this?
  3. Where can I find a set of rules to abide by to make sure I'm always producing correct/compatible code?
  4. Is this rust-specific, or something in llvm?
  5. Anything else I might want to start watching out for as I continue this journey? :slight_smile:

It seems like the kind of thing that will either be a few links someone happens to know about that explain everything, or the details are intertwined in many different layers of compiler internals and nobody will have a straight answer. Let's hope it's the former :slight_smile:

And of course, here's all of my test code thus far (it's just the one file): jitter/main.rs at 04c80735bb2b708227aa771f4e0bf3a6802fbda4 · yupferris/jitter · GitHub

Again, if I call back and forth between my generated code and Rust any number of times it appears to work (with or without any sort of function prologues/epilogues), until I touch something like printf/println!, and then it seems to matter.

Any help would be appreciated!
Thanks!

2 Likes

On i686-apple-darwin, the stack must always be aligned to 16 bytes. "push ebp" + "sub esp, 0x8" + "call" == 16.

5 Likes

It's not "stack protection", it's alignment. If you run your code under gdb, I suspect you'll find it trips a SIGBUS error at a MOVDQA or other SSE/vectory instruction. SSE instructions, used for floating point (sometimes, less often on i686 than x86_64) and memcpy() operations, require all variables to live at addresses divisible by 16, in order to allow SSE variables to live on the stack the stack pointer is kept divisible by 16 (because if the stack pointer could vary mod 16, an aligned variable would not be able to have a fixed offset relative to the stack frame base).

1 Like

Indeed, this sounds exactly like what I'm seeing. I ran it in hopper and yes, it was tripping up on SSE instructons eventually. Thanks!

After a bit more digging, it seems this is an OS X ABI restriction, and the more specific semantics are that the stack must be 16-byte aligned at function call sites. This makes perfect sense, and I didn't think to consider ebp and the return pointer factoring into that size (d'oh)!

Are there any more gotchas like this on os x, or win32 for that matter? My goal is to have this running here on i686 Darwin and win32; it seems like as long as I maintain the 16 byte alignment at function call boundaries that'll play nicely with both. Also, what documents might I find this information in officially? I keep seeing stuff about it in similar SO questions etc but it's a bit hard to find it from official sources (for os x at least, haven't dug for win32 yet)

Again, thanks for the help!

What you're looking for is generally called an ABI (Application Binary Interface) specification, although ARM uses the more specific name of "Procedure Call Standard". A quick search for "i386 Darwin ABI" didn't turn up anything useful, but if you code to the "SVr4 ABI" (used by Linux) you probably won't go too far wrong: https://uclibc.org/docs/psABI-i386.pdf

For instance, §2.2.2:

The end of the input argument area shall be aligned on a 16 (32, if __m256 is passed on stack) byte boundary. In other words, the value (%esp + 4) is always a multiple of 16 (32) when control is transferred to the function entry point. The stack pointer, %esp, always points to the end of the latest allocated stack frame.

1 Like

Awesome, thanks again! I've also done some more digging, and it seems http://www.agner.org/optimize/calling_conventions.pdf is a particularly good resource as well (Section 5 is consistent with the rest of the information here, for example).

1 Like

By the way you might be interested in https://github.com/CensoredUsername/dynasm-rs. It's still a work-in-progress but it should make writing JITs in rust a lot easier.

1 Like

Thanks for the tip! Unfortunately this particular project requires the JIT be extremely small and specific to my use case, so I'll skip it for now, but I'll def keep an eye on it :slight_smile: