Regarding function calls

Playing with rust.godbolt.org I've seen that this C++ code:

struct Two { int a; int b; };

void func_struct(Two);

void call_struct() {
    func_struct({1, 2});
}

Compiled with GCC gives the asm:

call_struct():
        movabs  rdi, 8589934593
        jmp     func_struct(Two)

And Clang gives the same:

call_struct():                       # @call_struct()
        movabs  rdi, 8589934593
        jmp     func_struct(Two)             # TAILCALL

While this Rust code:

pub struct Two { a: i32, b: i32 }

#[inline(never)]
pub fn func_struct(x: Two) {
    println!("{} {}", x.a, x.b);
}

pub fn call_struct() {
    func_struct(Two { a: 1, b: 2});
}

Gives with optimizations the asm:

example::call_struct:
        mov     edi, 1
        mov     esi, 2
        jmp     qword ptr [rip + example::func_struct@GOTPCREL]

Is this going to cause some slight performance difference?

This have both be a positive and negative effect on the performance. This increases register pressure in trade for having to do less shift and bitwise or instructions to merge and extract the fields of the struct. The difference will likely depend on your specific use case.

Note: This only happens when using the SystemV abi for x86_64 as far as I know. Other abi's are far less aggressive with fitting the most amount of data in the least amount of registers.

2 Likes

You can change the ABI with pub extern "C" fn func_struct, if you want to compare performance with all else equal.

2 Likes

This.


You are comparing whatever C++ default ABI with Rust's:

  • an extern "Rust" function will choose to decompose the struct fields as a function parameter each. Hence using here a 32-bit register for its "first parameter", edi, and another for the "second", esi.

  • C++, on x86-64, seems to be doing, for this very function, the same thing that a C function would: take a single 64-bit-wide parameter:

    0x00000002_00000001 = 8589934593
      ^^^^^^^^ ^^^^^^^^
    

To compare both implementations with the same ABI, both the Rust and the C++ function should be marked extern "C". In which case they both use the "single 64-bit-wide paramater" approach, i.e., that movabs 85....


Now, if you are trying to compare Rust's current choice of ABI for this particular pattern, compared to C++'s, then that's another story. But know that:

  • nothing is guaranteed about Rust ABI, so your experiments could very well change from one compiler release to another;

  • in this case, I don't think there is a performance impact between initializing two different 32-bit registers and a signle 64-bit one, since I'd expect CPU pipelines to be able to perform the double assign in parallel. The main difference between the two is then just that the double mov approach "stains" a second register.

3 Likes