Discussion on Synchronous Crate Concurrency Refactor using Stackful Coroutines Model in Rust
Hello Rustaceans!
Following several month of internal discussion and then the release of bincode-next v3.0.0-rc.7, bincode-next experimentally supports Asynchronous decoding by using Stackful Coroutines Model (or more precisely, by using async fiber framework) behind the optional async-fiber feature gate. Using the Stackful Coroutines Model, we have achieved Concurrency Support without rewriting the main part of the crate and the serde ecosystem, and at the same time, preserved high Concurrency performance (on my old Latitude 5400, about 1.2M ops/s, but more precise benchmark are under way and will be released shortly). But the problem is, this Stackful Coroutines Model using async fiber heavily relies on low level assembly and unsafe actions, which, although we try out best to ensure its safe, has some limitation under the current compiler utilities. And as it is Stackful, it also has some performance cost (though very much limited because of our stack pool implementation and the virtual memory table techniques of modern OS, and has an unexpected benefit which makes its performance very much more smooth under high Coroutines situation) and is not compatible with no-std environment.
Pros and Cons of our Approach
Although this is very much the basic knowledge of Asynchronous engineering, in order to make the reader have a clearer picture, we will repeat the pros and cons here.
Pros
- Using the async fiber model, we partially solve the function coloring problem of the Stackless Asynchronous Model, bridging the Synchronous ecosystem and the Asynchronous ecosystem (especially the serde problem).
- Performance mostly smooth under high concurrency status (near linear), and friendlier for the CPU branch predictor (huge Finite State Machine in Stackless Asynchronous Model often drag the performance when under high concurrency).
- Easy to use API just like normal async function.
- ...
Cons
- Safety issue that requires the user to do not write thread local dependent actions in the manual defined Decode trait implementation (the TLS issue).
- Tools like Miri, LLDB, AddressSanitizer, Tokio tracing, Kani will stop working correctly (which force us to use Loom, Proptest and fuzzing for testing).
- Because we use low level assembly instructions, cross platform testing is a problem.
- The Philosophy issue (See the following section for more details).
- ...
The Philosophy issue
As we all know, Rust, at its core, has its philosophy of zero cost, safety by default and so on. But this approach using low level assembly and is stackful, contradicting with the Rust way in some sense (we think everyone knows this and the reply to this post certainly knows this better...). But in another sense, maybe, just maybe, everything in Rust is to some extend, servicing the engineering need. So could we just define the Rust philosophy to be providing flexible and high performance tools for engineering practices instead of defining pure safe rust as the only symbol of the rust philosophy? Maybe yes, but maybe no (myself is devoted in the field of theoretical physics, if you found what I said a little bit confusing you could just ignore it). As this is more of a philosophy issue, there shall be not a definitive answer. So let's just keep ourselves open-minded (just like according to the CoC of our community) and discuss.
Links
Code Repository: GitHub - Apich-Organization/bincode: Bincode-next: The next official rust implementation of bincode · GitHub
Discord Server: Apich Organization
Contact E-Mail: info@apich.org
Note: We know that if been asked, the std team, compiler team, lang team, and many other developers or ecosystem developers of rust will certainly opposite this idea of using stackful coroutines model, but its just an engineering approach, though not idiomatic. We hope that as for the reply of this post, we would keep reasonable and rational, instead of go emotional.