Compare the coroutine between Rust and C++

In C++, the coroutine model can be simplified to the evaluations of three methods: await_ready, await_suspend, and await_resume. Instead, we only have one in Rust, the poll. For example, we have a coroutine that is suspended(corresponding to Pending) the first time the await expression is evaluated and continues to evaluate the following code(corresponding to Ready) after it when the second time. In C++, the code is

#include <iostream>
#include <coroutine>
struct Task{
    struct promise_type;
    using handle = std::coroutine_handle<promise_type>;
    struct promise_type{
        auto initial_suspend(){
            return std::suspend_never{};
        }
        auto final_suspend() noexcept{
            return std::suspend_always{};
        }
        void unhandled_exception(){}
        void return_void(){}
        auto get_return_object(){
           return Task{handle::from_promise(*this)};
        }
    };
    handle coro;
};
struct SecondTimes{
    bool await_ready(){
        return false;
    }
    void await_suspend(std::coroutine_handle<>){

    }
    void await_resume(){}
};

Task fun(){
    std::cout<<"begin\n";
    co_await SecondTimes{};
    std::cout<<"end\n";
}

int main() {
   auto r = fun();
   r.coro.resume();
}

Instead, in Rust, the code is

struct SecondTimes{
	is_first:bool
}
impl Future for SecondTimes{
    type Output = ();
    fn poll(self: std::pin::Pin<&mut Self>, cx: &mut std::task::Context<'_>) -> std::task::Poll<Self::Output> {
		if self.is_first == true{
			self.get_mut().is_first = false;
			std::task::Poll::Pending
		}else{
			std::task::Poll::Ready(())
		}
    }
}
async fn fun(){
   println!("begin");
   SecondTimes{is_first:true}.await;
   println!("end");
}

For simplification, we just need to watch fun. If we desugar them, we will get this pseudo code in C++

auto fun(short& state, SecondTimes & obj, std::coroutine_handle<> p){
   switch(state){
      case 0:
       {
          std::cout<<"begin\n";
          state = 1;
       }
      break;
      case 1:
       {
          obj.await_ready();
          obj.await_suspend(p);
          state = 2;
       }
      break;
      case 2:
       {
          obj.await_ready();
          std::cout<<"end\n";
         state = 3;
       }
      break;
      case 3:
       {
           // may destroy coroutine's inner data
       }
   }
}

Instead, in Rust, we may get the following (pseudo)code for fun

fn fun(state:& mut u16, obj: std::pin::Pin<&mut SecondTimes>, ctx:&mut std::task::Context<'_>)->(){
    match state{
       0=>{
         println!("begin");
         *state = 1;
       }
       1 =>{
         match obj.poll(ctx){ 
            Pending=>{
               return;
            }
            Ready=>{
                println!("end");
                *state = 2;
            }
         }
       }
       2=>{
             // may destroy coroutine's inner data
       }
     }
}

The difference between them is C++ maintains the inner state of SecondTimes for us while Rust requires us manually maintain the inner state by ourselves. Specifically, C++ uses await_ready and await_suspend, and whenever the coroutine is resumed, such two methods will never be evaluated again, instead, only await_resume is guaranteed to be evaluated. By contrast, in Rust, the language only evaluates poll, which means we need the help of SecondTimes::is_fist to record that we have evaluated SecondTimes::poll once, which is equivalent to await_ready and await_suspend, and next time when the control flow again evaluates SecondTimes::poll, we should first read the state of SecondTimes::poll to tell us whether the evaluation of SecondTimes::poll is the second time and then return Ready if it is.

Fairly, I wonder which model is designed better? And why?

I’ve tried to get a basic understanding of C++ co-routines, but one hour in, so far, I still understand mostly nothing. They appear to be more flexible as futures in Rust. Supporting a notion of yield that we don’t have yet, but also the suspension model appears more flexible, or maybe just entirely different?

What struck me of particularly different to Rust in this regard is the handle passed to await_suspend, which allows you to resume the suspended coroutine yourself, apparently, leading to examples like this one on cppreference.com

#include <coroutine>
#include <iostream>
#include <stdexcept>
#include <thread>
 
auto switch_to_new_thread(std::jthread& out)
{
    struct awaitable
    {
        std::jthread* p_out;
        bool await_ready() { return false; }
        void await_suspend(std::coroutine_handle<> h)
        {
            std::jthread& out = *p_out;
            if (out.joinable())
                throw std::runtime_error("Output jthread parameter not empty");
            out = std::jthread([h] { h.resume(); });
            // Potential undefined behavior: accessing potentially destroyed *this
            // std::cout << "New thread ID: " << p_out->get_id() << '\n';
            std::cout << "New thread ID: " << out.get_id() << '\n'; // this is OK
        }
        void await_resume() {}
    };
    return awaitable{&out};
}
 
struct task
{
    struct promise_type
    {
        task get_return_object() { return {}; }
        std::suspend_never initial_suspend() { return {}; }
        std::suspend_never final_suspend() noexcept { return {}; }
        void return_void() {}
        void unhandled_exception() {}
    };
};
 
task resuming_on_new_thread(std::jthread& out)
{
    std::cout << "Coroutine started on thread: " << std::this_thread::get_id() << '\n';
    co_await switch_to_new_thread(out);
    // awaiter destroyed here
    std::cout << "Coroutine resumed on thread: " << std::this_thread::get_id() << '\n';
}
 
int main()
{
    std::jthread out;
    resuming_on_new_thread(out);
}

Possible output:

Coroutine started on thread: 139972277602112
New thread ID: 139972267284224
Coroutine resumed on thread: 139972267284224

E.g. the insane amount of different case distinctions for possible return types of await_ready , await_suspend , and await_resume doesn’t make the situation straightforward in any way. Good online resources are hard to come by as always… if you or anyone else has a suggestion for the most approachable resource(s), feel free to leave suggestions.

One other example that I found interesting was the one on std::coroutine_handle, std::noop_coroutine_handle - cppreference.com, which features lots of code defining Generator<T>, and then a use-case

template<std::integral T>
Generator<T> range(T first, const T last) {
    while (first < last) {
        co_yield first++;
    }
}
 
int main() {
    for (const char i : range(65, 91)) {
        std::cout << i << ' ';
    }
    std::cout << '\n';
}

This has me making some observations

  • the coroutines feature in C++ has someone of a syntax-extension-ish feel to me; this Generator<T> range(…) { … } item does – in my Rust view – two things at once: Create a state-machine translation for the function body, and then also pack it up into that custom Generator<T> type that wraps up the thing into an ordinary iterator; but the Generator<T> type apparently also has some say in what the syntax means in the first place, after all it’s able to rule out usage of co_await in the body:
    (in Generator<T>’s class declaration)
    // Disallow co_await in generator coroutines.
    void await_transform() = delete;
    
  • it looks to me like the state machine gets automatically hidden behind a dynamic abstraction, i.e. there’s no link between the Generator<T> type returned by range (and potentially other functions), and the logic of range in particular, so there must be some virtual functions involved.
  • in particular, there are implicit heap allocations involved – presumably in order to allow this dynamicness in the first place(?) – and this is explicitly called out in the resources I’ve seen so far, though there also appears to be rules for an “optimization” that somehow eliminates this in some cases, I have not fully understood what conditions there are.

With this out of the way, all I can say is that I know not enough about C++ coroutines to give some actual general comparison. But let me comment on / correct some claims about Rust futures and async fn that you made.

This Future implementation is violating the basic rules/conventions of the Future trait, in particular – which might be slightly underdocumented in the standard library docs as of this writing, quoting the explanation in the async book

If the future is not able to complete yet, it returns Poll::Pending and arranges for the wake() function to be called when the Future is ready to make more progress. When wake() is called, the executor driving the Future will call poll again so that the Future can make more progress.

Ah… now I also found something in the std docs, but not on the page of the Future trait:

Enum std::task::Poll

Indicates whether a value is available or if the current task has been scheduled to receive a wakeup instead.

Variants

Ready(T)

Represents that a value is immediately ready.

Pending

Represents that a value is not ready yet.

When a function returns Pending, the function must also ensure that the current task is scheduled to be awoken when progress can be made.

Note the very last sentence!

In this case, a future that suspends once but won’t need the executor waiting for anything, which is – as far as I understood – possibly what this “SecondTimes” future is supposed to do, would need to simply call wake immediately, before the poll even returns. This makes the future look a lot like yield_now in tokio::task - Rust, for which you can find the source code here. (Note that the whole thing is merely wrapped into another level of async fn in order to make the API more uniform and not expose the concrete future type YieldNow. Also note that the implementation actually changed recently; I’m linking the old implementation.)


As noted, I won’t comment further on the C++ stuff or try to do a comparison myself, but this desugaring “(pseudo)code” seems a bit off:

  • the signature with the fn fun(state:& mut u16, obj: std::pin::Pin<&mut SecondTimes>, ctx:&mut std::task::Context<'_>) makes no sense. fun desugars to a function returning a Future, and this state-machine code would then be part of the poll implementation of that Future.
  • I don’t know what “may destroy coroutine's inner data” means, but async fn futures will simply panic when polled again once in a finished state. If “destroy” refers to destructors, then: destructor calls happen by the normal mechanisms when dropping the Future object itself, not in its poll method.
  • The poll method needs to do the match in a loop! It cannot just return nothing after finishing the first step. The only reason why it can return is because either it has finished, or it is pending; the return type of this whole state machine needs to be Poll<()>; the Pending=> and Ready=> branches must return Pending/Ready themself accordingly.
2 Likes

Yes, I just want to try to ignore any detail about how to schedule the coroutine in both Rust and C++, I know we should wake() the coroutine by using ctx.wake() in Rust, but I didn't write it on the code to avoid the detail of constructing the Context. The SecondTimes just want to illustrate the complete evaluation of the coroutine, that is, the first time the coroutine is evaluated, the state is Pending, and the next time the Future corresponding to fun is polled, the SecondTimes will be ready. The desugar of fun merely illustrates how the code in the function body could be. Please ignore any technical details.

C++ seems to separate the poll in rust into await_ready, await_suspend, and await_resume, the latter can be used to process different stages while we need to process all in the single poll in rust. Compare the aspects of these, which is designed better or flexible or unnecessarily flexible?

Rust is better because design is different in an entirely different place which you haven't looked on at all.

Look on the description: When a coroutine begins execution, it performs the following: allocates the coroutine state object using operator new (see below).

This one line immediately makes C++ coroutines system into a fragile and unpredictable high-end server thingie.

You can not use them in the embedded context, you can not use them where memory allocations are at premium, etc.

Compared to that differences in polling strategy are minor.

I would say that some minor things are better in C++, but if you consider the fact that they completely botched the most important part… I would say that C++ coroutines are in much worse position.

It's kinda understandable: coroutines were pushed by Microsoft and Microsoft, basically, wanted C++ coroutines to match C#… that's how C++ ended up with what it have.

Rust tried to do the most important things right and spend a lot on that. GATs and design of async traits, all that complexity is not needed in C++ because they just refused to solve the tasks which is central to Rust's design of async/await.

Thus they could do some other things more ergonomic.

That one is very easy: when the moon phase is right and stars align “just so” you get inlining and no memory allocations, if you are not lucky — it doesn't.

This probably works fine with C/C++ embedded folks who consider upgrade of a compiler in a working project a complete and utter blasphemy. But wouldn't work with Rust for obvious reasons.

1 Like

I sat through a one hour presentation about coroutines in C++ on Youtube. I did not understand a thing. The presenter ended up by saying that none of that was intended to be used by regular programmers, that they should wait for library writers to wrap it all up in something reasonably usable.

I have watched vids and read about async and futures in Rust. I kind of get the idea but would be at a total loss to do anything with a future from scratch.

Luckily we have things like tokio to make async useful for the likes of myself.

So what about using a crate to make coroutines in Rust: https://crates.io/search?q=coroutine

Situation with coroutines in Rust is extra funny because Rust's async/await model is built on top of coroutines.

Only said coroutines are unstable while async/await are stable thus people like to build coroutines on top of couroutines (via async/await).

Huge waste of resources if you ask me, but I kinda understand why all that happens.

Yeah… yes. Indeed. Exactly my experience, hence my answer above focusing on the very little parts and bits I did get when giving up on Youtube videos and blog post, and falling back to cppreference.com as a source, even though I would have preferred a friendlier and less technical introduction.


This discussion seems to assume a clear notion of what a “coroutine” is supposed to be, and that such a notion would dictate the need for some kind of yield operation to exist. As far as I’m aware, “coroutine” is a fairly vague term, and Rust’s Futures are a form of coroutine feature. They are (compiler-internally) not built on top of coroutines, but on top of generators – at least that’s the language used in the context of the rust compiler, as far as I’m aware; rust’s (unstable) generators are a different (and more generalized) flavor of coroutine feature, and they do offer a form of yield next to async+await.

The fact that they’re merely more general means that Rust’s async fn and async {} blocks being built on top of generators internally is essentially zero overhead at all. Of course, the other way, crates such as this one building generators on top of futures has a small but non-zero overhead. The overhead being in the workarounds necessary to re-introduce a way for polling to return values while not finished, thus allowing yield again. It’s not really building coroutines on top of coroutines; the coroutines are still the same, it’s only one state machine being generated once, no insane layering or anything… of course it is still suboptimal; someone did an example on a few different generator crates, also comparing to the unstable generator feature of Rust itself, here.

1 Like

As I found out just this moment. Like so:

#![feature(generators, generator_trait)]

use std::ops::{Generator, GeneratorState};
use std::pin::Pin;

fn main() {
    let mut generator = || {
        for i in 0..=10 {
            yield i;
        }
        return "foo"
    };

    loop {
        match Pin::new(&mut generator).resume(()) {
            GeneratorState::Yielded(i) => {
                println!("{i}");
            }
            GeneratorState::Complete(s) => {
                println!("{s}");
            }
        }
    }
}

I guess it can make sense to have the high level async syntax/API stable while the coroutines down in its bowels are subject to change hence unstable.

To make it clear: c++ requires heap allocation for coroutines because coroutines may:
1 - must be allocated on the heap anyway and in this case rust will do the same like spawn
2 - have their body definition in a source file different from signature declaration so the compiler will not know how much space to allocate for the coroutine frame?

Otherwise in the same source file HALO optimization may be applied more easily and it does not require the inline of the entire coroutine body! Only allocation and deallocation parts are inlined which is tiny. Function pointers resume and destroy are used to manipulate the coroutine state from outside without the coroutine body being known at the call site