Tail recursion and llvm-ir start section

Hi,

Consider the following example Compiler Explorer

We have an infinite recursion which can be optimized to the loop, because it is a tail recursion.
The test functions in Rust and in C are equivalent:

pub fn test(a: i32, b: Foo) -> i32 {
    if a > 0 {
        test(a-1, b) 
    } else { 
        0 
    }
}
int test(int a, struct Foo b) {
    if (a > 0) {
        return test(a, b);
    } else {
        return 0;
    }
}

However if you look into the asm output the Rust code is not optimized to a loop by the LLVM compiler.
After investigating ir, I can see that rust allocates additional Foo structure in the beginning of the function in the start section. So the Rust code is equivalent to the following C code

int testr(int a, struct Foo b) {
    struct Foo b1;
    if (a > 0) {
        b1 = b;
        // __attribute__((musttail)) 
        return testr(a, b1);
    } else {
        return 0;
    }
}

the latter one is not optimized by the clang automatically. (But actually it can be optimized with clang/llvm by adding __attribute__((musttail)), well this another topic)

Finally, the question. This makes me wonder, why rust allocates the additional Foo struct in the begining of the function? Couldn't this break another optimizations as well (in other cases)? Do you think this should be improved by compiler team?

If you look at the MIR you see

        _7 = move _2;                    // scope 0 at /app/example.rs:7:19: 7:20
        _0 = test(move _5, move _7) -> bb2; // scope 0 at /app/example.rs:7:9: 7:21

The _7 = move _2; is this extra copy. It exists because all non-temporary arguments are copied before passing them to a function. In the rust abi arguments are owned by the callee and may thus be modified by the callee if they are passed by-reference. The copy is necessary to prevent such modification to modify the original copy of the argument. For some reason LLVM didn't remove the copy despite the original not being used after the call. When using extern "C" as abi, it will turn in a loop (not even a tail call).

Agree, this is what I tried to mimic in the modified C code when I put b1 = b.
This extra copy is unnecessary and ideally should be eliminated by rust at the MIR level.
You can also look at the following C code

int testr(int a, struct Foo b) {
    // struct Foo b1;
    if (a > 0) {
        struct Foo b1 = b;
        return testr(a, b1);
    } else {
        return 0;
    }
}

Here b1 is created where it needed, and clang transforms recursion to a loop again.
So my question is, why rust put the %_7 = alloca %Foo, align 8 in the beginning of the function?

In this case it is unnecessary, but if the argument would be used after the call, it is necessary. This means that it can't be omitted during MIR building. It would be possible as MIR optimization, but the copy propagation pass was removed as it caused miscompilation (and I believe was also very slow). The destination propagation pass can't handle this specific case I think and is disabled by default because of miscompilations.

alloca is how stackslots are defined in LLVM IR. b is copied to a temporary stackslot and this stackslot is used as argument.

This C code is not quite equivalent. The C equivalent to what rust does is using struct Foo *b as parameter. The C code passes b on the stack as byval instead of as pointer. It seems that LLVM is better able to handle the byval version. If I do

int testr(int a, struct Foo *b) {
    // struct Foo b1;
    if (a > 0) {
        struct Foo b1 = *b;
        return testr(a, &b1);
    } else {
        return 0;
    }
}

LLVM keeps the copy and call. Even when adding restrict like rustc does.

Thanks for your comments, they help to understand better what rust does internally.

if the argument would be used after the call, it is necessary

How can b be used after the call when it is forbidden by rust move semantics?

alloca is how stackslots are defined in LLVM IR

So, you are saying that having this in the beginning is out of control of the rust compiler, and is the way llvm works? I thought it is possible to allocate the stack only when necessary.

This C code is not quite equivalent.

I see, this difference that you pointed out in the ABI clarifies things. Though, it makes me even more confused. Why rustc wants to write this copy explicitly, when it can be done automatically with
byvalue attribute?

If it is a Copy type, it will be copied, not moved.

This is because rustc copies b to a temporary. This temporary is stored on the stack, hence the alloca.

When LLVM is able to optimize away the copy, passing arguments as pointer is faster. byval requires the value to be stored at a specific location at the stack. If this location is used by eg another call, it is necessary to copy the argument to this location. In addition byval requires the size of the argument to be fixed. Using a pointer allows passing unsized types as arguments. While this is currently an unstable feature, it is used internally by the FnOnce() impl for Box<dyn FnOnce()>.

To further illustrate that this ABI difference is at the root of the different tail-call implementations, see this other Godbolt, which adds #[repr(C)] to struct Foo, and defines test as an extern "C" function, for purely identical semantics (on platforms where c_int is i32)

This point I got :wink: My question is more about positioning of this alloca before "if" branching instead of in the corresponding "bbx:" section where recursive call is made.

Maybe this is something special about how llvm-ir works, anyway from your C example we can see that this wouldn't help.

If it is a Copy type, it will be copied, not moved.

Oh, is the information about copy/move erased in the MIR level?

When LLVM is able to optimize away the copy, passing arguments as pointer is faster.

If LLVM is able to optimize the copy generated by rustc, it can also be able to optimize copies related to byval. Practically, I don't know which optimization is more likely to happen. But to me it looks like the same thing, in general.

I also didn't get "location is used by eg another call" part of you message.

passing unsized types as arguments

How does it pass such argument if we need to make the copy before?

In conclusion, it seems like there is a kind of tradeoff between two ABIs.

If the alloca is not in the first block, LLVM will dynamically allocate every time the alloca is reached. This requires a stack pointer adjustment every time, saving the resulting pointer instead of allowing a fixed offset from the stack pointer and every time the alloca is hit, the stack grows a bit.

It isn't erased. I just think it isn't taken into account here.

Say you have two calls with a different byval argument. They will use the same stack location to store the argument. This means that after the first call it is necessary to copy the argument of the second call to this location. This is an unavoidable limitation of the way the ABI is defined. The rust abi however uses a pointer, so it can simply pass a different pointer as argument for both calls without the need for a copy at ABI level. The copy in the rust case happens because the compiler is not smart enough to remove it, not because the ABI requires it in some cases.

I think there is a special case for unsized arguments to avoid the copy at MIR level. I think this is fine as unsized types can never be Copy and thus will always be moved.

In conclusion: this copy exists in rustc's case because it (or LLVM) isn't smart enough to remove it right now.

Makes sense now.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.