Hi, got a (historical) Rust question: the once-proposed CPS style Future.
Recently I took a look at how Kotlin implemented its CPS coroutines; at first glance both Kotlin and Rust compile to state machines, but it quickly turns out Kotlin coroutines have some quirks that I'm not familiar with, so I decided to compare them against Rust more carefully. In particular, for Rust async/.await, Boats once gave a talk and some design processes behind the scene on this. Basically, Kotlin's CPS are ultimately callback functions stored somewhere and the state machine is for the callback to resume execution, i.e. a approach more akin to internal iteration. OTOH Rust settled for a more external iterator approach, for this way the interaction with lifetimes are better and various combinators may be implemented in a zero-cost way.
Now to the question. Boats mentioned that Rust once experimented with CPS style but gave up in the end, of which API is something like this:
My question is, why is the receiver self, rather than &mut self? In Kotlin terms, I guess this API correspond to the compiler generated invokeSuspend for suspend fun, but Kotlin isn't that good a reference in this case, for its GC nature and every non-primitive type is a reference: I cannot think of a direct Kotlin counterpart for a function of which receiver requires consuming itself...
My understanding is that when the CPS Future is called (schedule), it is then consumed. Since we still need its current internal states later, we need to put it somewhere, s.t. when the FnOnce callback finally got invoked, it knows how/where to continue (the what part is already in its parameter). In other words, this API actually forces library authors to allocate, which is yet another reason why this approach doesn't play well with the zero-cost abstraction requirement. Is this the basic gist of it?
The way I see it the FnOnce is the continuation, so just having the FnOnce allows you to continue. However you have to store the FnOnce somewhere, and that's what effectively requires an allocation.
// A: library provided "primitive"
// in terms of modern API, those `Future` who store/wake the `Waker` manually
struct A { s: String }
impl Future for A {
type Output = i32;
fn schedule(self, c: impl FnOnce(Self::Output)) {
// This probably allocates.
// In this case,
// the remaining work would be carried out by some other thread in the pool maintained by the library
// TLS implementation s.t. `c` stays on the current thread should be possible, though
TASK_QUEUE.push(move || {
let mut f = File::open(self.s).unwrap();
let mut buf = [0_u8; 4];
f.read_exact(&mut buf).unwrap();
c(i32::from_le_bytes(buf))
});
}
}
async fn cps(s: String) -> String {
let a = A { s };
let i = await a; // legacy deprecated notation
async fn b(i: i32, s: String) -> u8 {
// some combination of library provided "async primitives", just like `cps`
unimplemented!()
}
let u = await b(i, s);
format!("{}-{}-{}", s, i, u)
}
It guess would then get transformed into more or less these:
struct Cps { s: String }
struct B { i: i32, s: String }
impl Future for Cps {
type Output = String;
fn schedule(self, continuation: impl FnOnce(Self::Output)) {
let a = A { s: self.s };
a.schedule(move |i: i32| {
let b = B { i: i, s: self.s };
b.schedule(move |u: u8| {
continuation(format!("{}-{}-{}", s, i, u))
});
});
}
}
So the input parameters are in the self receiver, which consumes the value in order to represent the state machine/FnOnce nature of the calculation process, and the continuation: impl FnOnce's job is to carry out the remaining job when some (presumably I/O) value is ready.
I'm omitting other control flow structures like loop and join!/select!, etc here; anyhow, is it basically the gist of the CPS style in Rust?
async fn foo() {
let a = 1;
let b: &i32 = get_some_reference(&a);
await bar();
println!("{b}");
}
It would get desugared to:
struct Foo;
impl Future for Foo {
type Output = ();
fn schedule(self, continuation: impl FnOnce(Self::Output) + 'static) {
let a = 1;
let b: &i32 = get_some_reference(&a);
let bar = Bar;
bar.schedule(move |_| {
println!("{b}"); // Invalid because this would make the closure non-'static
continuation(());
});
}
}
You thus have to also allocate a and b on the heap, as part of the async desugaring (rather than being an implementation detail of whatever runtime you're using). Even worse, this would likely have to apply to every async function invocation rather than just the topmost that's passed to the executor.