How to ensure stack align with 16 bytes, otherwise caused program crashed while enabled sse2?

As my understanding, x86_64-unknown-none target should be ok with sse2 enabled manually through -Ctarget-feature=+sse2. However, while enabled, my program crashed with stack alignment issue.

My program is likely a mini OS, loaded to a KVM machine (created by KVM API), trying to setup GDT/IDT. With sse2 enabled, the generated code as below make use of sse2 registers (0x8f). And I checked the rsp register, it is aligned with 8 bytes, but not 16 bytes.

With only sse enabled, it works just fine. However, I am trying to use hard-float by deleted +soft-float in a self x86_64-unknown-none.json, and this will cause compiler_builtin compiled failed, because it need float number, so I enabled sse2, and got this alignment issue.

By googling, chat-gpt, what all I want is that how to ensure stack alignment is 16 bytes. And +strict-align, preferred-stack-boudnary etc. flags are not working.

Any suggestion?

And more background:
I am try to porting newlib to my little os wirtten in rust, every things works fine, except when invoking printf("%f", 3.14) like statement, it always print out zero. By printed out float number representation in c and rust, they do differs. In c, it is big-endian, in rust it is little endian. I do not find a way to change to little endian in c which it should be default to be little endian.

So I try to enable sse in rust os, and get errors above.

These may be two different issue.

If your program is mini-OS, the it's responsibility of your program to ensure that %rsp would be initially aligned when C or Rust code is used.

Compiler just keeps stack aligned on the assumption that it was initially aligned.

Just use andb $~15, %rsp once and then keep track of what you are doing with stack.

2 Likes

thanks khimru.

%rsp and %rbp are initialized at 0x8000000, which is the top of memory. When crashed, rbp is still at 0x8000000, while rsp is 0x7ffdef8.

same program, when disabled sse2 in target feature, just works.

Well… it looks as if it was initially aligned and then you made it unaligned, somehow. Rust keeps aligned stack aligned like x86-64 says it's supposed to do which, most likely, mean bug in you code. Fix also belongs to your code, for obvious reason.

P.S. Maybe you have misread the requirements? %rsp should be aligned before you call the function, not after. Quote: “the stack needs to be 16 (32 or 64) byte aligned immediately before the call instruction is executed. Once control has been transferred to the function entry point, i.e. immediately after the return address has been pushed, %rsp points to the return address, and the value of (%rsp + 8) is a multiple of 16 (32 or 64)”. [1]


  1. 32 or 64 bytes alignment is for when you plan to put AVX-provided __m256 or AVX512-provided __m512 on stack and is probably not relevant for your case. ↩︎

Thanks khimru for your new update. I am reading the abi ...

Also, let me post more code:

the code run in the vm setup no_std staff, will reverse a 12 char string in memory and shift 8 bytes. The outer program will write a string there and prints out the reverse.

with default x86_64-unknown-none target, it works fine, decompile as below:
pic 2

with custom defined target, by just print the default one, and change is_builtin to false, delete "features" to enable sse2):
pic 3

then vm exits, docompile as below:
pic4

The difference is that, no xmm registers without sse2.

other pictures followed in other replies.

BY THE WAY:
With SSE2 enabled, the vm exits at 0x70, which is a movaps to xmm0, the %rsp is 0x7ffff18, %rbp is 0x7ffff80.

So you are putting Rust code on the _start without any glue code? It wouldn't work and it's easy to see why.

x86-64 API says that the value of (%rsp + 8) is a multiple of 16 (32 or 64) while you say that %rsp and %rbp are initialized at 0x8000000.

0x8000000 + 8 is not a multiple of 16 thus nothing works, as expected.

You have to have a glue code which would handle difference in the requirements somehow.

1 Like

Great thanks, khimru.

All problems solved, just by set stack top to 0x8000000 - 8.

I never truly understand the stack frame.

There are nothing to understand, it's, to some degree, an arbitrary decision. Most modern architectures have branch-with-link instructions which copies address of current instruction into some other register (usually called lr) and then jumps at the instructions start.

On CPUs that are designed like that you have have stack aligned both before call to function and inside of your function, too.

But on x86 that call function, as implemented in hardware, pushes current address (32bit or 64bit depending on the CPU mode) into the stack and then does jump.

That means that you couldn't have aligned stack both before and after that call instruction.

You either have stack aligned before call — or it may be aligned after it, but not in both places.

Designers of x86-64 ABI did [almost] an arbitrary decision[1] to have it aligned before call… but that means that inside of your function it would be shifted by 8 from aligned position.

And if you are not doing call, but start your function in some other way it still would expect that property: the value of (%rsp + 8) is a multiple of 16 (32 or 64)


  1. It's not entirely arbitrary: if your language have support for alloca then it's easier to align the stack before doing call. Rust doesn't support alloca, but it still uses the same ABI as C. ↩︎

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.