Unconsistent behavior by overriding `lto` in Cargo.toml

I came with a minimal template to run in Linux-compatible OS with a x86_64 architecture.

This template consists of barely the same function in different locations.
The function is a write syscall invocation to standard output of an &str.

fn _print(msg: &str) {
    let bytes = msg.as_bytes();
    unsafe {
        core::arch::asm!(
            "syscall",
            inlateout("rax") 1usize => _,
            in("rdi") 1usize,
            in("rsi") bytes.as_ptr(),
            in("rdx") bytes.len(),
            out("rcx") _,
            out("r11") _,
            options(nostack)
        );
    }
}

There are several flavors (locations) of this same function:

./src/main.rs
./src/amod.rs
./src/lib.rs
./crates/print/src/lib.rs

Where src/main.rs is bounded to a binary described in Cargo.toml; src/amod.rs codes a module declared in src/main.rs.

[[bin]]
name = "template"
path = "src/main.rs"
test = false

; the src/lib.rs and crates/print/src/lib.rs codes a libraries also defined in Cargo.toml.

[workspace]
members = ["crates/print"]

[dependencies]
print = { path = "crates/print" }

[lib]
name = "template"
path = "src/lib.rs"
test = false

The Cargo.toml overrides the default lto in release mode to true;

[profile.release]
lto = true

Then by running:

git clone git@github.com:ze-gois/rust_template_x86_64
cd rust_template_x86_64
cargo run --release

works fine,

❯ cargo run --release
    Finished `release` profile [optimized] target(s) in 0.00s
     Running `target/x86_64-unknown-none/release/template`
Test 0: src/main.rs
Test 1: src/amod.rs
Test 2: src/lib.rs
Test 3: crates/print/src/lib.rs
Test 4: crates/print/src/lib.rs (static)
0, 1, 2, 3, 4.....

however the

cargo run

does not work.

➜  rust_template_x86_64 git:(main) cargo run
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/x86_64-unknown-none/debug/template`
Test 0: src/main.rs
Test 1: src/amod.rs
[1]    4409 segmentation fault (core dumped)  cargo run

Could you give more information what kind of result/behavior “doesn’t work” involves?

The repo defines the dev and release profiles as follows

[profile.release]
lto = true
codegen-units = 1
opt-level = "z"
[profile.dev]
debug = true
opt-level = 1

so it's (at least) one of these properties – or one of the remaining (non-overridden) differing properties from the default properties in these profiles – that makes the difference of “works fine” or not, for you :wink:

1 Like

The program consists invoking a very similar function, a x86_64 inline assembly wrapped in common rust declaration.

Executing in release mode prints the messages correctly, while segfaults in debug mode.

~/tmp/rust_template_x86_64 main
❯ cargo run --release
    Finished `release` profile [optimized] target(s) in 0.00s
     Running `target/x86_64-unknown-none/release/template`
Test 1: Local inline assembly works
Test 2: Lib inline assembly works
Test 3: Crate inline assembly works
Test 4: Static crate printing
Panicking:.....
~/tmp/rust_template_x86_64 main
❯ cargo run
    Finished `dev` profile [optimized + debuginfo] target(s) in 0.00s
     Running `target/x86_64-unknown-none/debug/template`
Test 1: Local inline assembly works
[1]    8111 segmentation fault (core dumped)  cargo run

I see. I'm not experienced in using opt-level = "z" so maybe that mode tends to "optimize away" segfaults into other kinds of UB? Feel free to test this by trying out other optimization levels in the .release configuration [0, 1, 2, 3, "s", "z"] (for experimenting with only one "variable factor" changing). Chances are, if you get segfault in some optimization modes, you might possibly "simply" have undefined behavior in your program.

I probed few combinations of these.

Setting lto = true alone allows the program to "work fine" for me (?) :upside_down_face:.

However, removing the Cargo.toml directive (default release), the execution fails at writing into the standard output in an external crate with a static variable, while the external crate wrapped write syscall works fine.

I haven't used inline asm in rust, but the implementation of template::print looks broken in that the two arguments may be in registers that are overwritten by the mov instructions. The other print functions do not have this problem.

(Why are you calling syscalls like this in the first place?)

I am to know what is really happening in this template, I am to get it working properly and get it documented through the way.

I am humbly attempting to interoperate assembly and rust.

Executing syscalls like this we can wrap them to have an associated behavior of them in our programs, for instance.

I am to be able to work with bare metal, witch is valuable in this networked standalone devices era.

that's a clear indication of UB, the lto flag just happened to trigger the UB.

I'm not sure if it is the root cause, but I don't think the flags register is guaranteed to be preserved across a syscall, the kernel is allowed to change it, so you should not use preserves_flags there.

also, I don't think readonly can be used here. besides not written to (user space) global variables, the compiler also assumes readonly means no synchronization between threads, which is definitely not true for a syscall: there are plenty of synchronization done in the kernel explicitly or implicitly, e.g. by file descriptors, by global resources like tty driver, etc.

2 Likes

Looks like the problem is not the inline assembly, but something about linking:

Breakpoint 2, 0x00007ffff7ffc19d in template::entry () at src/main.rs:30
30	    template::print("Test 2: src/lib.rs\n");
(gdb) disassemble
Dump of assembler code for function template::entry:
   0x00007ffff7ffc160 <+0>:	push   %rbp
   0x00007ffff7ffc161 <+1>:	mov    %rsp,%rbp
   0x00007ffff7ffc164 <+4>:	sub    $0x30,%rsp
   0x00007ffff7ffc168 <+8>:	lea    0xeab(%rip),%rdi        # 0x7ffff7ffd01a
   0x00007ffff7ffc16f <+15>:	mov    $0x14,%esi
   0x00007ffff7ffc174 <+20>:	call   0x7ffff7ffc120 <_ZN8template6_print17hbd8563b84eef9ef1E>
   0x00007ffff7ffc179 <+25>:	lea    0xeae(%rip),%rdi        # 0x7ffff7ffd02e
   0x00007ffff7ffc180 <+32>:	mov    $0x15,%esi
   0x00007ffff7ffc185 <+37>:	call   0x7ffff7ffc0e0 <_ZN8template4amod5print17h81881157fe4355e7E>
   0x00007ffff7ffc18a <+42>:	lea    0xeb2(%rip),%rdi        # 0x7ffff7ffd043
   0x00007ffff7ffc191 <+49>:	mov    0x1fe0(%rip),%rax        # 0x7ffff7ffe178
   0x00007ffff7ffc198 <+56>:	mov    $0x13,%esi
=> 0x00007ffff7ffc19d <+61>:	call   *%rax
   0x00007ffff7ffc19f <+63>:	lea    0xeb0(%rip),%rdi        # 0x7ffff7ffd056
   0x00007ffff7ffc1a6 <+70>:	mov    0x1fd3(%rip),%rax        # 0x7ffff7ffe180
   0x00007ffff7ffc1ad <+77>:	mov    $0x20,%esi
   0x00007ffff7ffc1b2 <+82>:	call   *%rax
   0x00007ffff7ffc1b4 <+84>:	mov    0x1fdd(%rip),%rax        # 0x7ffff7ffe198
   0x00007ffff7ffc1bb <+91>:	call   *%rax
   0x00007ffff7ffc1bd <+93>:	lea    0x1e54(%rip),%rsi        # 0x7ffff7ffe018
   0x00007ffff7ffc1c4 <+100>:	mov    0x1fd5(%rip),%rax        # 0x7ffff7ffe1a0
   0x00007ffff7ffc1cb <+107>:	lea    -0x30(%rbp),%rdi
   0x00007ffff7ffc1cf <+111>:	call   *%rax
   0x00007ffff7ffc1d1 <+113>:	lea    0x1e50(%rip),%rsi        # 0x7ffff7ffe028
   0x00007ffff7ffc1d8 <+120>:	mov    0x1fc9(%rip),%rax        # 0x7ffff7ffe1a8
   0x00007ffff7ffc1df <+127>:	lea    -0x30(%rbp),%rdi
   0x00007ffff7ffc1e3 <+131>:	call   *%rax
End of assembler dump.
(gdb) p/x $rax
$1 = 0x0

So in debug build at 0x7ffff7ffe1a8 should be the address of external function, but it's zero, so it segfaults.

With lto it works, because the functions are inlined:

(gdb) disassemble
Dump of assembler code for function entry:
   0x00007ffff7ffd0b0 <+0>:	push   %rbp
   0x00007ffff7ffd0b1 <+1>:	mov    %rsp,%rbp
=> 0x00007ffff7ffd0b4 <+4>:	lea    0xf53(%rip),%rsi        # 0x7ffff7ffe00e
   0x00007ffff7ffd0bb <+11>:	mov    $0x1,%eax
   0x00007ffff7ffd0c0 <+16>:	mov    $0x1,%edi
   0x00007ffff7ffd0c5 <+21>:	mov    $0x14,%edx
   0x00007ffff7ffd0ca <+26>:	syscall
   0x00007ffff7ffd0cc <+28>:	lea    0xf4f(%rip),%rsi        # 0x7ffff7ffe022
   0x00007ffff7ffd0d3 <+35>:	mov    $0x1,%eax
   0x00007ffff7ffd0d8 <+40>:	mov    $0x15,%edx
   0x00007ffff7ffd0dd <+45>:	syscall
   0x00007ffff7ffd0df <+47>:	lea    0xf51(%rip),%rsi        # 0x7ffff7ffe037
   0x00007ffff7ffd0e6 <+54>:	mov    $0x1,%eax
   0x00007ffff7ffd0eb <+59>:	mov    $0x13,%edx
   0x00007ffff7ffd0f0 <+64>:	syscall
   0x00007ffff7ffd0f2 <+66>:	lea    0xf51(%rip),%rsi        # 0x7ffff7ffe04a
   0x00007ffff7ffd0f9 <+73>:	mov    $0x1,%eax
   0x00007ffff7ffd0fe <+78>:	mov    $0x20,%edx
   0x00007ffff7ffd103 <+83>:	syscall
   0x00007ffff7ffd105 <+85>:	call   0x7ffff7ffd130 <_ZN5print12print_static17h1ac0dfc67f049055E>
   0x00007ffff7ffd10a <+90>:	call   0x7ffff7ffd110 <_ZN4core9panicking9panic_fmt17hb1db3a1e9aaa3573E>
2 Likes

good analysis. I belive this may be the panic handler or unwind landing pad.

if my guess was right, I think the culprit is the linker script: it discarded the sections that are required to do unwinding panic.

I think these are actual calls to print functions (outside of current file), not panic or unwind handling.

Everyone, we had an amazing interaction with the Claude Sonnet 3.7 through Zed.

The case is solved. And the template is over. The template will remained untouched and is licensed over BSD 3 clause.

The issue was:

  • Static linking
  • Proper section initialization
  • Explicit memory layout
  • Avoiding dynamic features
  • Inlining critical functions

This is a very nice template, imho. :sparkles:

Have a good day for all your attention.