What is the cost of adding TCO to rust / why does not rust have TCO?

anon80458984 · August 2, 2021, 6:06am

There is an old thread at When will Rust have TCO/TCE? with many ideas but (afaik) no concrete answer.

What exactly is the cost of adding TCO to rust / why does rust not have TCO ?

2e71828 · August 2, 2021, 6:09am

This question might be better posed on internals, as that's where the people who actually work on the compiler tend to hang out.

jhpratt · August 2, 2021, 6:09am

After quickly reading that thread, I don't believe anything has fundamentally changed since that question was asked.

anon80458984 · August 2, 2021, 6:11am

I saw a number of ideas but no concrete answer from that thread. Did you get a concrete answer out of that thread?

2e71828 · August 2, 2021, 6:16am

Yes. Rust doesn't have TCO because nobody has put in the effort to specify and implement it. There seems to be general support for an RFC along these lines, but nobody has stepped up to drive one forward.

There's no particular reason why that someone couldn't be you, however; there are several good leads there about how you might go about it technically.

anon80458984 · August 2, 2021, 6:16am

In particular, it is not obvious to me what the "cost" of TCO in Rust would be. Suppose:

fn A() {
  return B();
}

fn C() {
  A();
}

then what we have here is:

| stack frame C | stack frame A |

without TCO, we do:

| stack frame C | stack frame A | stack frame B |

then when B returns, we pop B's frame, then pop A's frame. With TCO, we just pop A's frame before hand, and do:

| stack frame C | stack frame B |

instead.

====

Atleast with sysv c abi calling convention, I do not see any additional overhead that TCO creates -- so I am trying to understand: what is the overhead / cost of adding TCO to rust ?

scottmcm · August 2, 2021, 6:20am

Rust absolutely has TCO -- as an optimization that usually happens fine in optimized builds.

What it doesn't have is guaranteed TCE such that it can be counted on in unoptimized builds as well.

The core problems, where not much has changed, as far as I know, is in actually being able to make that guarantee reliably on all targets. It's a fairly simple transform for direct tail recursion (so simple you can just make the combinator yourself) but not so much for tail calls to things that might be in other crates.

If you want to be able to rely on a TCE guarantee, then you'd probably need to do something like a survey showing that the LLVM, GCC, WASM, and Cranelift backends can all reliably provide it on all platforms. And that it can be done with negligible compile- and run-time overhead for code that doesn't use it.

(The syntax and drop-semantic parts are much simpler in comparison.)

anon80458984 · August 2, 2021, 6:21am

Can you expand on this? I think my post above What is the cost of adding TCO to rust / why does not rust have TCO? - #6 by zeroexcuses covers the 'direct' case; what is the more complicated case you have in mind ?

scottmcm · August 2, 2021, 6:30am

Well, for example, how do you do that in WASM? AFAICT call just is a normal call, without the option to do a tail call.

Or can you do it through a function pointer? What about using things other than the SysV C ABI?

Tailing to yourself is easy. Tailing to a different function -- especially one that might indirectly recurse back to you -- is less so.

anon80458984 · August 2, 2021, 9:46am

I believe the description above handles all of these cases.

bjorn3 · August 2, 2021, 10:28am

No, the calling convention needs to support tail calls. The SysV C abi does not support it in the general case.

[...]

As far as I understand, the primary issue with calling conventions and tail calls are callee-saved registers. The issue is that there's generally no way of efficiently knowing if they were already spilt to stack or not by the time we're in a function prologue and similarly if we should restore them (and even from where) in the epilogue.

That's one of the reasons, why AFAIK all the calling conventions in LLVM that support tail calls don't have any callee-saved registers. That way the tail call can easily deallocate the current frame and just jump into the new function passing all the arguments in the registers or on the stack if needed, but does not keep anything on the stack that would need to be manipulated when finally returning.

[...]

anon80458984 · August 2, 2021, 11:00am

You're right. I now see why my argument fails.

This (callee saved registers), by the way, is the type of technical answer I was looking for. Up until now, there were lots of vague ideas of what may or may not be a problem, but this here is a concrete issue: if we have C -> A -> B, and A already spilled callee-saved registers, then after B 'returns', we need to restore those registers, which makes TCO significantly more complicated.

atagunov · August 3, 2021, 9:37pm

It would be an option to restore those values from stack back into registers before doing the tail call wouldn't it?

Under normal circumstances - no guaranteed TCE - the value would have been restored in epilogue anyway, so it seems like this restoration is not an extra cost.

sighoya · August 7, 2021, 10:29am

I think the best compromise would be to do it only on the api level, not on the abi.
Even then, I would guarantee cycle detection intra-modularily at best. Conducting a whole graph analysis for tail call analysis is heavy.
And if the user wanted to change that, he could do that with a proc attribute.

system · November 5, 2021, 10:29am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
When will Rust have TCO/TCE? help	9	8598	August 7, 2021
Do I get TCO here? (And is it bad style?) code review	3	796	June 12, 2022
What does "Rust & OOP" mean to you? help	35	13447	July 3, 2022
Tiny Rust or something	9	1315	October 5, 2022
Why create Rust instead of improving C/C++? help	12	2182	August 4, 2022

What is the cost of adding TCO to rust / why does not rust have TCO?

Related Topics