Coroutines for Rust


Coio-rs: Coroutines with work-stealing scheduling algorithm and I/O Support.



Why do you use GOMAXPROCS=4 in the go benchmark? If you have a 4-core machine, the benchmark should show roughly the same results without it, right?


Damn… And what do I do with my own pet coroutines? :confused:

BTW, several questions.

  1. Does it work for stable?
  2. Do you store context on heap or on stack? I stumbled upon nice trick in Boost.context - they store all needed registers on stack and then return stack top as opaque context handle; then, they switch to other handle and restore state in reverse order. With some other tricks, like placing service structures ‘below’ the stack, this allows to run alloc-free coroutines, if some stack pool is maintained. The problem is, their context switching is done via raw machine ASM.


Because I want to be sure and indicate that the Go runs with exactly 4 Procs. So it should be a fair competition.

  1. No. It cannot work with stable. Obstacles:

    1. The simd crate. (The context-rs crate)
    2. FnBox feature
    3. thread::catch_panic
    4. The deque crate.
  2. I store context on heap. I haven’t thought about such kind of optimization, if you got some idea about it, let’s discuss it in issues :smile:

    1. Context switching also done by raw ASM.
    2. I think you are talking about asymmetric coroutines, which coroutines has relations just like a stack, please refer to my project rustcc/coroutine-rs.

There are a lot of work to be done. I need help.


I think by default Go has GOMAXPROCS=1? I hit this barrier when trying to test races.


Please check for ASM part and for Rust part; I use precompiled binaries in the latter to be stable-compatible - fortunately they’re tiny

Long story:
I ripped off ASM form Boost.Context and slightly modified (at the level I can write ASM)

Main ideas (original to Boost.Context, not my invention in any way):

  1. Use stack itself to store context
    • jump_fcontext will push pieces of context data onto stack directly
    • after state is pushed, SP is written to location at ARG0
    • then SP is set from ARG1
    • then state is popped in reverse order
    • then RETVAL is set from ARG2 (it’s used in fcontext to pass intptr_t messages between coroutines
    • then, we jump to IP (which is usually the last element of context)
  2. make_fcontext constructs environment suitable for following jump_fcontext; like, it allocates stub state and sets appropriate IP there to the function address supplied as ARG2; ARG0 and ARG1 denote stack base and its size

And here’s my tiny addition
Rationale: jump_fcontext allows to pass message to other coroutine, but there’s no “initial message” specific to coroutine itself.

  1. jump_fcontext transforms its ARG2 into RETVAL and no longer into ARG0
  2. make_fcontext has tiny trampoline and ARG3
    • function pointer and ARG3 are placed onto stack below context
    • context references trampoline
    • trampoline transforms RETVAL into ARG0 (received from jump_fcontext), then pops into ARG1 (make’s ARG3)
    • then, trampoline pops jump destination and jumps there
      Result: you can pass all init data you wish to make_fcontext, and your context proc will be called with two arguments - first being ‘message’ from other coroutine, and the second being coroutine’s own data pointer. My tiny tasklets prototype shows how this can be used to store normal closure below coroutine stack and invoke it with little trouble.


Before Go 1.5, the default is GOMAXPROCS=1.


Well, that looks brilliant, I should spend some time on your crate. Actually I think your modification could somehow be merged to the context-rs crate. :smile:


You’re welcome. Though I have full-time and other RL things, so not sure if I’ll have time to help you. And frankly merging my code means dropping all Rust context switcher code. You sure wanna this?


Merge with their best implementation.


I’ve checked context-rs code more thoroughly, and have a question.
Is there some need for separate Context::pub fn save / Context::pub fn load?


I don’t need them, but someone does.

The mioco crate requires the Context::load, and generator-rs needs both of them.


It’s very interesting why. I can’t imagine save not followed by load - except jumping out of finished coroutine.


I’m currently using only swap, but I asked for load for jumping out of finished coroutine. This just reminds me I’m supposed to use it to save precious data cache!

Also save and load are useful as a “jongjmp”/“goto” replacement. One can save a context state, then have some conditional code depending on global variable and do a jump back using load, to take a different branch this time. I believe this is what generator-rs is doing.


Very interesting. Though a bit strange to me. And approach with storing context on-stack doesn’t fit here well.


I think it’s perfectly fine to have two implementations of Context within one crate. There could be stack::Context or something. Crate users would be able to pick the one they want, depending on the tradeoffs they want etc. Probably some tests could be reused etc.


It just came to my mind that in-stack jumping is an insane idea - simply imagine you store inside nested call and then load several stack frames up. Guess how loud would it kaboom. To me, pure load can be done via supporting null pointers in first argument of swap.


I don’t see how it would “kaboom”.


I’ll try to explain:

First, you have such stack:

fn foo()
fn bar()
fn baz()

Inside bar, you have some on-stack thing like HashMap.
You perform save inside baz() into some context1
When you go out of bar(), your HashMap is dropped, just as expected.
Then, inside foo() you perform load from context1. And you happen executing inside baz(). Except, stack corresponds to foo() after bar(), with some garbage on top. Then, you again go out of bar() and try to drop HashMap. At best, you’ll attempt double free. At worst, you’ll attempt to deallocate some garbage. Kaboom.