My understanding is that in Rust coroutine frames by default are stack allocated -- in other words when you call an async function foo
without awaiting it yet, you're creating a temporary on the stack that is the size necessary to store the local variables of foo and any other async functions that it calls, as well as the data needed to track which await point it is currently frozen at, transitively all the way down.
In order to do this the compiler needs to know how much space the local variables take up at the call site. This is different from a regular (sync) function where calling only requires knowing the signature (so the arguments can be passed in the right registers according to calling-convention/ABI) and the address to jump to. This means in langs like C/C++, the function signature that goes in the header is sufficient information to begin compiling a call to a sync function, even if the sync function itself hasn't been compiled yet. So two libraries (two distinct compilation units) can be compiled in parallel even if they depend on one another and call each other
But in Rust, it seems like the amount of space needed for locals could possibly vary based on optimization passes (e.g. eliminating unused local variables making the frame size needed smaller). This means Rust would really need the callee to be completely compiled before compiling the caller. And typically in Rust crates are the compilation unit boundary (assuming codegen-units=1), so this would mean if crate A calls an async function in crate B, that crate B must be compiled first, not in parallel with A. Is this correct?