C# Tasks-like Behaviour with Futures


#1

I’d like to write some of my async APIs similar to how I would do using Tasks in C#. Tasks are the C# equivalent of Rust’s Futures. However, a key way they differ is that (typically) when they’re returned from a function they are already executing and don’t have to be polled by the caller in order to start. The caller then simply attaches some continuation under them which processes the result asynchronously.

This idiom is especially handy in client-side code and will be a good fit for some APIs I’m currently writing. Futures do not lend themselves to such use; you could achieve it but wouldn’t be very ergonomic. My question is, with the proposed Tokio reform, wherein executors are being decoupled from reactors, will such idiom become easier to achieve?

I understand that Futures are as they are because they intend to be zero-cost whereas a Task-like behaviour will impose runtime costs. But I feel there are people out there who are willing to pay such costs in return for simpler APIs. So perhaps both use cases are worth considering.


#2

Rust futures don’t need to be polled to start and are also typically already running (or possibly even completed) when returned. Polling is just a way to query the future’s status. This isn’t something a client of a future does though, typically. Some futures need a reactor to complete, but that’s a particular future’s implementation detail. Clients of a future in Rust also typically attach continuations to dispatch when the future completes, and those continuations can themselves return futures.


#3

You’re right in a way @vitalyd. The function which is returning the future could spawn it on CpuPool, for instance, before returning. However, when the resulting CpuFuture is received by the caller and he appends a continuation to it, the continuation will not automatically get spawned. The caller will explicitly have to spawn it. This is in contrast with C# where any continuation you add to a running task automatically get scheduled on the same thread pool (often on the same thread), resulting in a far cleaner experience for the caller. This is the experience I’m after: you basically call an async api (say “send_request_async” or whatever), this guy returns you a running future-like thing, you append a continuation to it to process the result and and simply move on. This is far more natural to use than you having to remember to spawn your continuation yourself, to say nothing of the fact that the you have to decide which thread pool you should be scheduling it on. Should it be the same as the original CpuPool or one of your own? Is there any way it can be the same thread as the original future ('cause you do want use the data already in the CPU cache)? All of these issues leak what is internal scheduling details of the of the “send_request_async” API to caller and cause unnecessary friction for him.


#4

The C# case is actually a bit more complicated than that. It’s been too long since I’ve had to know the details, but basically the continuations are scheduled via a SynchronizationContext/TaskScheduler which is an implicit global context. You can explicitly spawn a Task onto a specific TaskScheduler when needed, and you can avoid the current task scheduler and fallback to the default global thread pool when needed.

I haven’t really been following the low-level details of the reform, but from the discussion I’ve seen I feel like it’s going in a very similar direction, there’s an implicit global Executor which could be contextual based on the current Task where spawned futures will go, and if necessary you can explicitly pass Executor's around and spawn futures directly onto them.

Rust’s futures are significantly different to C#'s tasks though, and there’s no easy way to change that without losing a lot of the performance. You’re never really spawning individual continuations anywhere, when you’re awaiting/chaining via .then you’re building up a data structure representing the entire workflow required to process the current operation. Only once you’ve finished building up that data structure do you have something that you are going to spawn into a Task on an Executor. Maybe some parts of that operation need to be spawned off onto a different Executor for some reason, but that’s entirely encapsulated inside the part of the operation that spawns then awaits on that other Task, once that other Task is complete you are still processing this Task on whichever Executor it was spawned on originally.


#5

Ok I see the distinction you’re making - thanks for clarifying. As @Nemo157 said, the Rust way of building up a type, consisting of the chain, and then spawning that whole thing onto an executor is what gives it the performance (as you alluded to as well) - the whole chain is known to the compiler as it’s one (potentially big and nested) type. I actually think @Nemo157 summarized the whole thing quite well. What might help is to walk through some concrete scenario you’d like to support and discuss how that might look like in Rust, either with the current model or after reform.


#6

A concrete example could be C#'s HttpClient. To send a GET request, for example, you’d typically invoke it like this:

var httpClient = new HttpClient();
var response = httpClient.GetAsync(uri)
            .ContinueWith( t => ProcessResult(t.Result)); // <---- continuation. Automatically scheduled.

As you can see the caller doesn’t have to worry about scheduling. He just calls the method and appends a continuation to the return value and he’s done.

In contrast, in Hyper today you have to do the following:

let mut core = Core::new()?;
let client = Client::new(&core.handle());

let work = client.get(uri).and_then(|res| {
    process_result(res);
});

core.run(work)? // <----- Caller must schedule the future explicitly.

It is more work than C#, especially if you have to make such HTTP calls at multiple locations in the code. It puts the burden of scheduling squarely on the caller, who now has to grapple with questions like, “Should I schedule it on a threadpool or the Core?”, “If I have to pass client to some other location, will I have to also carry core along with it?”. And all this while all the caller wanted to do was make a call and get the result back :-|.

Note that whole of this commentary is more applicable to the client-side than to the server side. On client-side the bar for performance is often lower compared to servers (not necessarily “low” but “lower”). So trading some performance off for greater ergonomics is a valid decision. As I see it, the way futures are designed they’re more geared towards server-side where you need to extract the last bit of performance and need more control. They are not that good a fit for the client-side in my view (as perhaps the hyper use case demonstrates above). Overall I strongly feel we need a better client side story than we have today and C# Tasks are a great model to look to for inspiration.

Now the question is if Tokio reform will enable such simplifications or not. By making executors more implicit, I believe it will take away some burden. However, I suspect it won’t change the fact that the caller has to do the scheduling. From my understanding so far, I’m getting a feeling that perhaps we could have a separate library focused on client-side which could provide the above behaviour, something like tasks-rs (I know it conflicts with the use of the term “task” in the futures model, but can’t think of a better word).


#7

Let us pause and analyze your example step by step for a second

let mut core = Core::new()?;
let client = Client::new(&core.handle());

This is basically a more explicit version of C#'s new HttpClient(), which exposes the initialization of the scheduling infrastructure instead of hiding it like C# does. The difference boils down, IMO, to different language design choices:

  • Rust aims to be usable in zero-runtime environments like OSdeving and embedded, whereas C#, which does not have this goal, does not mind stacking arbitrary complexity in the language runtime.
  • Rust is explicit about error handling, even in initialization of “infrastructure” like Tokio’s Core. Whereas the C# runtime will just blow up if something bad happens at that stage of program startup.
  • Rust is quite opinionated against implicit initialization of global state on application startup, as this creates obscure bugs related to the order in which things are initialized. C# doesn’t care much because it has a stronger runtime/user library distinction.

In the end, I do not think this one matters that much in practice for usability, as in typical usage, one will spawn a Core once and reuse it many times. I personally prefer the robustness and clarity of explicit initialization. But hopefully you can see what kind of pros there might be to initializing and passing around the Core explicitly, instead of making it hidden global or thread-local state.


The second part is pretty much identical to your C# example, formatting aside, so I will not comment much on it:

let work = client.get(uri)
                 .and_then(|res| { process_result(res); });

The third part is where things get interesting:

core.run(work)?

By adding this extra step, we let the Rust async stack know that we’re done stacking continuations on top of a future, and that it can start listening to it. Deferring the scheduling like this is what allows Rust’s futures to achieve optimal performance by avoiding the building of the task machinery until the moment where we’re sure we have the final async state machine on your hand.

I agree with you that this can add verbosity in practice. But I also need to balance this concern against something else: how frequently do you really expect to call core.run() in a realistic application, as opposed to a toy learning example?

This is what usage of Rust futures in a complex application currently looks like. As you can see, you can schedule very complex work in a future long before you need to care about scheduling it. By doing so, you will amortize the scheduling overhead, and make optimal use of the aforementioned state machine optimization that Rust futures’ design enable.


To me, what this example also shows is that infrequently needing to pass futures down to Core is far from the most pressing usability concern in Rust’s futures implementation. If I were part of the language, compiler or Tokio development team, looking at this code would rather convince me that:

  • There is too much “and_then” boilerplate in there. We need to trim down the syntax for continuation scheduling by introducing async/await. In the future, we could also consider eliminating await as well, there are pros and cons for it (the usual concise vs explicit tradeoff basically).
  • The compiler bug which causes non-linear compilation time blow-up in long future chains must be fixed, so that we can do with less “expert incantations” like random insertion of boxed() in the continuation chain.
  • We also need impl Trait in order to get rid of that last boxed(). Boxing a future or not should be the client’s choice.

…and it seems that I’m not alone, since at least two of the bullet points in that list (namely impl Trait and async/await) have landed in experimental form in nightly, and are on their way to stabilization.


#8

I agree with you @HadrienG. I agree about Rust’s goal of zero-runtime. I agree about the extra cost demanded by C#'s approach. I agree and support the Rust teams current priorities.

I’m not saying here at all that Rust should sacrifice is primary goal of being a close-to-the-metal language. What I’m saying instead is that there are additional use cases such as client-side async which should also be considered (at some point). If as a programmer I’m willing to sacrifice some performance to gain simplicity, Rust should support me. My only assumption is that there are many more people like me writing client side code who will also appreciate such support making it a worthwhile consideration.

At first I was wondering if the proposed Tokio changes will make it easier to provide such support via futures. But as I understand now, futures fundamentally cannot provide such behaviour – they are not designed to. Hence, my conclusion is that perhaps the best way forward for folks like me is to write a tasks library parallel (no pun intended) to futures from the ground up and use that. What are views of people here about that?

how frequently do you really expect to call core.run() in a realistic application, as opposed to a toy learning example?

I presume you’ll have to call core.run() every place where you’re making an HTTP call. Wouldn’t that be the case? Maybe I’m wrong.


#9

I see quite a bit of a trade-off here. Let me try to explain.

On one hand, I agree with you that there is a place for abstractions which lean more on the “convenience” side of the performance vs usability trade-off. The cleverness of the futures-rs model is arguably overkill for less demanding applications, and as I’ve shown, this model can hardly be made much simpler to use without losing some of its desirable characteristics (explicit error handling, no hidden global state, minimal dynamic memory allocation and dynamic method dispatch). This would indeed be an argument for experimenting with alternate APIs for less demanding asynchronous programming scenarios.

On the other hand, there is also the question of efficient use of limited manpower and the consistency of the Rust async ecosystem. If possible, we should stick with only one way of writing asynchronous Rust applications, because it means that every async Rust library works in the same way and composes well with other async libraries, all the while avoiding costly duplication of software development effort and lengthy debates between proponent of various async approaches on common interfaces.

I am also uneasy about the choice of drawing a clear boundary between client code and server code. It may be due to my general inexperience with network programming, but I don’t see what prevents a server from being a client too from time to time (think SSO and other “cloud-ish” infrastructure, where server A outsources some work to server B), in which case one would need to mix two different and subtly incompatible programming models in a single application.

Well, one great thing about futures is that they are composable. You can build an arbitrary complex chain of futures somewhere inside of a library, then pass it up to another library which attaches more work on top of it, and only later give the final, very complex future object to the task scheduler.

In particular, nothing prevents you from stacking multiple HTTP requests in a single future, using a combinator like and_then. You only need to call the Core at the very end, when you want to kickstart the complex asynchronous machinery that you have iteratively built.

In a way, it’s similar to the idea of a command buffer in modern low-level graphics APIs like Vulkan or Direct3D 12. You are encouraged to build sophisticated coarse-grained requests and to submit them to the GPU scheduler as late as possible, as this minimizes API overhead (which was a common performance bottleneck in the OpenGL days). In principle, there is a latency vs throughput trade-off there, but in practice building a request is so fast that bigger requests are almost always better.

(Where this analogy breaks down, though, is that Vulkan allows you to submit a single command batch multiple times without rebuilding it over and over again. It would be nice if Rust’s future asynchronous APIs allowed you to do this, although I’m not sure how much performance it would actually save in practice).


#10

You would call Handle::spawn to schedule a future on an existing event loop (ie Core). Core::run is generally where you start the event loop with an “outer” future (on the server, this would be your connection accept loop).

One of the changes in the reform is an implicit/global event loop. That means you don’t have to Core::run anything (if using the global one is ok) or even create the Core. I suspect that is more in the direction you’re looking for.


#11

@gurry For a bit of context, in case you’re unaware of the Tokio/futures reform proposal, here’s what it is about. You may also be interested in the code examples in there.

The proposal is being discussed here. In particular, I think the part of the discussion near the end which touches on the tradeoffs of global state is particularly relevant to our ongoing discussion.


#12

I did not mean to suggest forking of the ecosystem. The “conveniences” could be built on top of futures. An approximate solution could be something like this: you could have a concrete impl of Future called SpanwedFuture or RunningFuture. It represents a future that’s already running. An API could return this guy instead returning something abstract like imp Future. SpawnedFuture has a method called, say, then_spawn() which spawns a given closure as a separate item which will make it very similar to C#.

This is how it could look:

imp Future for SpawnedFuture {
    ...
}

// An API using it
fn send_request() - > SpawnedFuture {
    ...
    // The future is spawned here which results in spawned_future object
    ...
    spawned_future
}

and the caller could use it like this:

send_request().then_spawn( |r| {
    process_response(r);
});

The suggested implicit resources in the reform (thanks for sharing the links @HadrienG; I have skimmed over it earlier, but need to have a closer read) could help by making it easy to spawn both the SpawnedFuture within send_request() as well as the caller’s closure outside via then_spawn().

It may throw up some design challenges but something like this may be worked out. These mechanisms can live in their own separate crate.

I did not mean multiple requests in time. I meant multiple locations in the code where you will have to write that code.

Oh yes, you’re right @vitalyd . But in that case you have to pass around the handle now alongside the client instead of core. So remains similarly awkward.

Thanks guys for your comments. My original question is kinda answered, the answer being that yes, the changes in the reform would help in what I’m looking for. I’ll read up more on this topic and wait for more details about the changes to emerge. :slight_smile:


#13

Right, you have to do that currently although the reform talks about removing the need for that as well so long as spawning onto the default executor is sufficient. Will be interesting to see how it plays out.


#14

Yup. I agree. This aspect should go away with the proposed changes around implicit loop etc.


#15

I’m pretty sure the event loop takes ownership of the Future that it is executing, in which case you cannot implement such a thing as a SpawnedFuture because once you have handed the Future to the reactor, you do not have it anymore.

Which makes sense in a way. Being able to modify a future as other event processing code is allowed to interact with it would create tricky race conditions, and resolving these would in turn require use of various mechanisms which have overhead (heap allocation, synchronization if multiple threads are involved…).

Regarding how often something like Core.run would be needed in practice, it will depend on the application, but on the server side I would assume you would need to start the reactor at the end of the application’s main() function (to start accepting connections), and then occasionally need to spawn an additional task inside of the asynchronous code (e.g. once per connection for a web server). Basically every time a synchronous server would need to spawn a thread. Not sure how the picture would change on the client side.


#16

That largely depends on your application architecture, three cases off the top of my head:

  • Traditional request/response server (e.g. HTTP). You generallly have a top level loop taking in incoming connections, these get passed to your handler which returns a Future indicating completion of the response. Maybe that handler does an HTTP request as part of its processing, if it does then that just becomes part of the Future it’s returning. Once the handler has constructed the Future representing the handling of that one request the top level loop will spawn it onto a thread pool.

  • CLI application. Probably a bit less standardised than a traditional server, but I would likely design it as a single Future representing the entire run of the application. Unless it’s having to do a ridiculous amount of parallel work then everything would probably be part of that single future and so just spawned onto the main core at the top level.

  • GUI application. Lets assume there’s some sort of futures-aware GUI framework, like most traditional GUI frameworks it only lets you modify the GUI on the main thread (unlike most traditional GUI frameworks it has compile time features to enforce this :smile:). To make this easier it runs all event handler Futures on an event loop it is running on the main thread. In this case if you perform some sort of processing in the client that takes a bit of CPU time then you’ll want to be doing this off-thread. You would do so by spawning off that processing to a new Task and having the Future returned from the event handler wait on that Task to complete before using the result on the main event loop to update the GUI.

The first two cases you will generally not need to spawn extra Tasks yourself, either the top level is handling spawning per-request Tasks for you, or your entire application is a single Task. The last one is a bit more interesting, especially if you look at how it could work in both Rust

use futures::sync::oneshot;

#[async]
fn get_text(client: Client) -> io::Result<String> {
    let html = await!(client.get("https://rust-lang.org"))?;
    Ok(process_html(html))
}

#[async]
fn refresh_button_clicked(button: Button, context: Context) -> gui::Result<()> {
    let text = await!(oneshot::spawn(get_text(context.client()), context.thread_pool())?;
    if let Some(textfield) = button.parent().child::<TextField>('textfield') {
        textfield.set_text(text)?;
    }
    Ok(())
}

and C#:

async Task<String> GetText(Client client) {
    let html = await client.Get("https://rust-lang.org");
    return ProcessHtml(html);
}

async Task RefreshButtonClicked(Button button, Context context) {
    let text = await Task.Run(() => GetText(context.client()));
    let textfield = button.Parent().Child('textfield') as TextField;
    if (textfield != null) {
        textfield.SetText(text);
    }
}

Because of await using the implicit synchronization context, and the Rust Task being run on a specific executor, these end up looking very similar. In both cases the event handler Task is being run in the context of the GUI event loop, to avoid blocking this the future for getting text from the web page is spawned off onto a separate thread pool. The result of this is then awaited inside the event handler Task via a proxy object, once it completes the event handler continues on the GUI event loop and is able to set the text of a UI field.

There are 2 differences here (other than being very different underlying models):

  • In the Rust version the extra task is spawned onto a specific thread pool, in C# it’s using the shared global thread pool; hopefully after the Tokio reform there would be some way to easily use a shared global thread pool in Rust as well.

  • In the Rust version the Future created by get_text is “cold” (not yet started), so it can be cheaply created while still running on the GUI event loop. It’s also “unbound”, it does not carry the fact it was created in the context of the GUI event loop with it. Only once it is created is it spawned off onto the thread pool, and process_html will then continue running in the thread loop.

    Whereas in the C# version I believe creating the Task in GetText will both:

    • Be a “hot” task, running until the first yield point on the GUI event loop
    • Be “bound” to the current context. Once it hits that first yield point it will look up the current synchronization context and queue the continuation to run on that.

    That’s why Task.Run is passed a function to use to start the Task in the context of the thread pool, if the Task itself was created then passed to Task.Run it would still run on the GUI event loop because of the context binding. (I could easily be wrong on this point, it’s been a couple of years since I worked in C# and trying to track down documentation on how this works is not easy.)


#17

Yes @HadrienG. It will be a real obstacle. But I feel with a little more exploration of the problem we could find a way around that. I’ll probably spend some time on it next year.

Interesting line of thought @Nemo157. I especially like the GUI loop as a future idea :slight_smile:. Thanks for sharing it as an alternative to trying to create tasks :+1:


#18

@gurry It is certainly possible to build futures which can be extended as they are in the process of being executed. In fact, most future implementations provide this possibility.

However, the fact that my Boost.Thread-based programs tend to be bottlenecked by dynamic memory allocations when future continuations are short makes me feel that the “keep it on the stack” goal of the futures-rs designers was an important one, and I have an intuition that you cannot build stack-allocated futures which allow for concurrent extension and execution, and avoid exposing event loop mechanics to the application.

Here’s how my intuition goes:

  • If a future can be executed and extended concurrently, and if the application thread never explicitly yields control to the event loop (which is one of the things that functions like Core.run do silently for you), then the only place left for an event loop is a separate thread.
  • Such a multi-threaded event loop must, at any point in time, hold a valid memory address to the futures that it is executing. It may also modify said futures during execution.
  • Which leads us to a first problem: if the event loop does not own the futures, guaranteeing to the event loop thread that the memory address of our client-side stack-allocated futures will remain valid is impossible. When the scope of the future is exited, or when the future is moved (for example because a continuation is attached), the original stack address will become invalid.
  • So when this happens, we need to somehow put the future elsewhere and signal the other thread that its address has changed. Here, one issue is that Rust does not allow moving a borrowed object or customizing what happens when an object is moved. So already, at this stage, we’re going beyond what the current Rust language can do.
  • But assuming we could hook on moves just as well as we can hook on drop, the event loop thread may be modifying the future object, and will not get our “I’m going to move it” memo right away. So our only option at this stage would be to freeze the client thread as an unfinished future is moved or goes out of scope. This does not sound exactly glamorous.
  • So I may have missed something, but it seems to me that there is no elegant way around having one heap-allocated object per continuation in the “concurrent scheduling and execution” design.

Do you see an incorrect assumption or oversight in this reasoning?


#19

@HadrienG You’re right. There’s no way this can be done without paying allocation or even extra thread costs. If these costs are not absurd, I’m willing to pay them given the use-case I’m focused on.

Going through your steps of deduction, your thinking seems valid overall. But honestly speaking I do not yet understand all aspects of futures to be able to make a non-daft comment. I’m gonna have to sit down and read through the documentation and write some code to appreciate all the niceties. Then I’ll be able to appreciate the challenge. And what you’ve written above will be very helpful in my thinking when I get down to it. So thanks for sharing :slight_smile:


#20

You’re welcome. If I can suggest a possible track for exploration, one idea would be to try to get rid of the “one dynamic allocation per continuation” limitation which plagues most future implementations, and to get it as close to the “one dynamic allocation per asynchronous task” behavior of futures-rs as possible. Perhaps you could get there by allocating a much larger memory block than you actually need, and then stacking your continuations in there without disturbing significantly any struct which has been shared with the event loop thread. Given enough memory unsafety and ugly thread synchronization tricks, it feels like something that could be done.