Inherently inefficient calling convention in Rust?

Consider this simple piece of code:

pub struct S {
    a: [i32; 4096]
}

#[inline(never)]
pub fn f(s: S) -> i32 {
    s.a[1023]
}

#[inline(never)]
pub fn g(s: S) -> i32 {
    f(s) 
}

Compilation of g() shows memcpy when calling f(): https://gcc.godbolt.org/z/FGXNFA
Why? S is not Copy, so the source object could not be used after the call anyway. Why not just reuse its memory?

1 Like

Rust treats moves just as a memcpy that invalidates its source, and relies on LLVM to decide whether to skip the actual copy. It might be interesting if rustc elided the copy itself, but that's not always a direct thing to do -- for instance let x = if pred { y } else { z }.

5 Likes

Yes, this is a pretty well-known fact, but what was the point behind this design decision (move => memcpy)? Isn't move supposed to be the "logical" instrument for expressing the intent of passing ownership? Doesn't Rust go the way of zero-overhead abstractions?

1 Like

Moving is not always just a change in local names. You could be moving values from one struct into another, or moving out of a vec::IntoIter, or any number of possibilities. Even with a simple let x = y, you might assign a new value to y elsewhere, so x needs to be distinct. It's simpler to model all of these as a memcpy with distinct locations to begin with, and let the optimizer perform copy elision.

13 Likes

From my point of view it still seems that there are not so many cases when moving requires copying data, if a non-Copy type is considered. let y = x invalidates x, so an assignment to x after this statement could be handled as working with a new memory location.

After all, pretty much of the typish beauty of Rust comes from its move semantics, but even in the simplest cases it comes with a poor memcpy when inlining is not performed, which you probably won't force for public functions if you don't want your projects to compile forever :frowning:

1 Like

The bad news is "optimization is hard", and formally specifying optimization guarantees is even harder (if not impossible), so in general we don't want the language spec to try to guarantee that "in this case the copy must be elided" without some very strong motivation (e.g. https://github.com/rust-lang/rfcs/pull/2884).

The good news is that it's probably very feasible for rustc to do a lot better here in typical cases without any hard guarantees. See #32966 Do move forwarding on MIR and various other A-mir-opt issues.

11 Likes

Optimization is hard, of course, but in the OP example the problem seems to be not a lack of optimization, but an inefficient calling convention. Why couldn't it be such that "moved-into" objects be passed by pointers? Is it justbecause it would be an exception from the common rules rustc handles movings?

1 Like

This is my understanding.

But I'd argue this would be an optimization. The calling convention is as much part of the language as anything else, so why wouldn't changing it be optimizing Rust? Keep in mind that this could be changed at any time - Rust's ABI is neither stable nor specified.

If a PR was made to Rust which changed the calling convention to automatically pass any structs >512 bytes by reference, I wouldn't know what to call that besides an optimization PR.

2 Likes

I believe there's no theoretical reason why this optimization would be illegal, but considering every Rust function signature explicitly specifies whether each parameter should be passed by value, reference or raw pointer, it doesn't seem like there'd be much point in making the compiler spend precious time second-guessing the programmer (unless of course a lot of other transformations, like inlining, have already radically changed the code).

It's definitely not that. For one, pass by value vs pass by reference is simply not what "calling convention" refers to. That term almost always refers to much lower-level, platform-specific or even hardware-specific conventions for how assembly code arranges function arguments and return values around call instructions.

Second, switching pass by value for pass by reference (which I assume is what you're actually trying to suggest) is not the only possible optimization here. For example, RVO (return value optimization) would also get rid of this memcpy without inserting any pointers/references. That's why the links in my previous post (https://github.com/rust-lang/rfcs/pull/2884 and https://github.com/rust-lang/rust/issues/32966) are to proposals for something arguably equivalent to RVO.

1 Like

The question is about the common way of generating code corresponding to a function call. Maybe I am misusing the term, but the Wikipedia definition is kind of suitable in this case.

How RVO is related to the example where the only return type is i32?

I was talking about the optimizations that are performed by the compiler, not the optimizations in the compiler itself.

On typical hardware like x86, the simplest form of RVO applies whenever the return type can fit within a CPU register and the function being called has a single trailing return statement (and there are probably ways to generalize it further than that). A single i32 definitely fits in a register.

It could be interesting, if the problem with the example was returning an int and not memcpy of a big struct :slight_smile:

let y = x invalidates x , so an assignment to x after this statement could be handled as working with a new memory location.

Consider:

let x = somestruct.x;
let y = x;
drop(somestruct); // Free struct's memory
use(y); // x's location is invalid now
1 Like

Well, you can't drop a struct after partially moving out of it. But you could overwrite it, and if another binding is now using part of its original space, the write would require moving the whole struct to a new location.

And there would be problems if, for example, this happened in just one branch of a conditional:

let mut y = something();
if foo {
    let x = y;
    y = something_else();
}
use(y);

Despite these complications, I think it is possible for the compiler to be a lot smarter about eliding moves. For now, though, it's often important for the programmer to avoid passing around large types by value.

The compiler can:

    let y = {
        let foo = Foo {x:String::new()};
        foo.x   
    };

and Box has special powers.

In calling convention, this large type is passed by pointer! Look at your godbolt f(), the only architectural argument used is rdi, which it's able to offset in a single instruction.

The thing you're objecting to is just that on the caller side, you're moving your existing s into the argument, and sadly this memcpy isn't getting elided. But for a different example, you can construct the argument directly, and this does not perform any copies, just its memset initialization.

#[inline(never)]
pub fn h() -> i32 {
    f(S { a: [0; 4096] }) 
}

godbolt

2 Likes

Also, if you remove your forced #[inline(never)], all of these calls do inline and get compiled down to nothing, no memcpy/memset at all.

4 Likes

Rust prefers simple compostable rules over special casing. Move is semantically always single shallow memcpy, no exception, even for those Copy types. If you want pass-by-reference there's an explicit way to do so. Why do you want to make a function which seems to move its argument, but actually not?

This is the same as (*foo).x, where you move out of the box on dereference, so no partial moves out of Box (but that doesn't mean box isn't special).