What is the cost of adding TCO to rust / why does not rust have TCO?

There is an old thread at When will Rust have TCO/TCE? with many ideas but (afaik) no concrete answer.

What exactly is the cost of adding TCO to rust / why does rust not have TCO ?

This question might be better posed on internals, as that's where the people who actually work on the compiler tend to hang out.

2 Likes

After quickly reading that thread, I don't believe anything has fundamentally changed since that question was asked.

1 Like

I saw a number of ideas but no concrete answer from that thread. Did you get a concrete answer out of that thread?

Yes. Rust doesn't have TCO because nobody has put in the effort to specify and implement it. There seems to be general support for an RFC along these lines, but nobody has stepped up to drive one forward.

There's no particular reason why that someone couldn't be you, however; there are several good leads there about how you might go about it technically.

2 Likes

In particular, it is not obvious to me what the "cost" of TCO in Rust would be. Suppose:

fn A() {
  return B();
}

fn C() {
  A();
}

then what we have here is:

| stack frame C | stack frame A |

without TCO, we do:

| stack frame C | stack frame A | stack frame B |

then when B returns, we pop B's frame, then pop A's frame. With TCO, we just pop A's frame before hand, and do:

| stack frame C | stack frame B |

instead.

====

Atleast with sysv c abi calling convention, I do not see any additional overhead that TCO creates -- so I am trying to understand: what is the overhead / cost of adding TCO to rust ?

Rust absolutely has TCO -- as an optimization that usually happens fine in optimized builds.

What it doesn't have is guaranteed TCE such that it can be counted on in unoptimized builds as well.

The core problems, where not much has changed, as far as I know, is in actually being able to make that guarantee reliably on all targets. It's a fairly simple transform for direct tail recursion (so simple you can just make the combinator yourself) but not so much for tail calls to things that might be in other crates.

If you want to be able to rely on a TCE guarantee, then you'd probably need to do something like a survey showing that the LLVM, GCC, WASM, and Cranelift backends can all reliably provide it on all platforms. And that it can be done with negligible compile- and run-time overhead for code that doesn't use it.

(The syntax and drop-semantic parts are much simpler in comparison.)

8 Likes

Can you expand on this? I think my post above What is the cost of adding TCO to rust / why does not rust have TCO? - #6 by zeroexcuses covers the 'direct' case; what is the more complicated case you have in mind ?

Well, for example, how do you do that in WASM? AFAICT call just is a normal call, without the option to do a tail call.

Or can you do it through a function pointer? What about using things other than the SysV C ABI?

Tailing to yourself is easy. Tailing to a different function -- especially one that might indirectly recurse back to you -- is less so.

1 Like

I believe the description above handles all of these cases.

No, the calling convention needs to support tail calls. The SysV C abi does not support it in the general case.

[...]

As far as I understand, the primary issue with calling conventions and tail calls are callee-saved registers. The issue is that there's generally no way of efficiently knowing if they were already spilt to stack or not by the time we're in a function prologue and similarly if we should restore them (and even from where) in the epilogue.

That's one of the reasons, why AFAIK all the calling conventions in LLVM that support tail calls don't have any callee-saved registers. That way the tail call can easily deallocate the current frame and just jump into the new function passing all the arguments in the registers or on the stack if needed, but does not keep anything on the stack that would need to be manipulated when finally returning.

[...]

7 Likes

You're right. I now see why my argument fails.

This (callee saved registers), by the way, is the type of technical answer I was looking for. Up until now, there were lots of vague ideas of what may or may not be a problem, but this here is a concrete issue: if we have C -> A -> B, and A already spilled callee-saved registers, then after B 'returns', we need to restore those registers, which makes TCO significantly more complicated.

2 Likes

It would be an option to restore those values from stack back into registers before doing the tail call wouldn't it?

Under normal circumstances - no guaranteed TCE - the value would have been restored in epilogue anyway, so it seems like this restoration is not an extra cost.

I think the best compromise would be to do it only on the api level, not on the abi.
Even then, I would guarantee cycle detection intra-modularily at best. Conducting a whole graph analysis for tail call analysis is heavy.
And if the user wanted to change that, he could do that with a proc attribute.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.