Async Interviews

+1 for proposal of adding block_on into the stdlib. In my current project, I encountered a case of using block_on: implementing a trait method.

Currently, Rust does not allow async fn for Trait method, so what if a trait method calls another async fn? I ended up using futures::executor::block_on . (Unless there is a better way.)

It might be the case that in future we can define async fn in trait as well, but I think my case showed that block_on can be useful in unexpected (at least for me) ways.

"Async interviews: My take thus far" is available

This is a relatively new thing, but a new concern about AsyncRead and AsyncWrite is that, fundamentally, they were designed around epoll -like interfaces. In these interfaces, you get a callback when data is ready and then you can go and write that data into a buffer.

I don't think that's fair. Completion based APIs are common in systems for multiple decades. Kernels use them internally. DMA transfers and disk IO are completion based. Windows offered completion based APIs via IOCP since Windows NT. libusb uses them, etc.

I brought up the concern of not supporting those during the standardization of Futures - but it felt like most people had simply not been interested in those enough, because the focus for those was "I want to build a fast HTTP server on Linux".

However I still think async/await is widely useful beyond it. The intent is being a zero-cost abstraction over things which previously made use of callbacks, promises and co. As long as we e.g. can't wrap in a zero (or even "lost-cost") fashion around underlying technologies which are completion based - which is everything in the end if we look down at the hardware layer - that goal is not reached.

That said I am not opposed to standardizing some form of the current AsyncRead/Write APIs. They have their use-cases. And most of all they are easy to understand and implement by hand.

3 Likes

Thanks for this great research! I'm much in agreement with most of the conclusions.

If you maintain a library, what are some of the challenges you’ve encountered in making it operate generically across executors? What could help there?

The biggest challenge I found was the absence of common interfaces, or more precisely the absence of implementations of those interfaces.

I have worked around it by creating/implementing those interfaces myself and wrapping existing executors (async_executors and async_nursery). With those in place I feel it's possible to write executor agnostic libraries. I don't agree that having to spawn in libraries is a rare requirement limited to async drop.

Another big issue is the Sink trait which is quite absent from these discussions. The main issue seems to be the requirement of being able to reserve a slot in the sink. This makes implementing it quite awkward, which means not many types implement it even where it would be appropriate. This makes it hard to abstract out over channels for example.

There isn't much background available about the motivations for the current design, and if it outweighs the downsides, or if there are better alternatives. I have tried to summarize what I have found through web searches in this issue. It's not even clear to me if there are any examples of code that currently uses the Sink trait and relies on the possibility of reserving a slot.

Anyway, I think it's an important interface, and I hope we can work to a consensus of what it should look like.

Do you have ideas for useful bits of polish? Are there small changes or stdlib additions that would make everyday life that much easier?

  • Diagnostics is definitely still creating friction today, things like issue or issue are quite though when you run into them.

  • abstracting over Send basically requires doubling all interfaces, eg. Spawn and LocalSpawn. Having to box futures for async trait methods, makes the problem worse.

  • supporting async main and #[test] async fn would be a nice bit of polish, although it's easier to work around than the issues above.

  • I think your propositions for moving things into std are spot on. Personally I don't mind to bump version numbers regularly, but as time goes by the stability will be appreciated by more people, so it's good to anticipate.

  • I know it's a much more ambitious feature, but it's worth noting how often the phrase "you would need GAT's for that" comes up in user forums. It seems it would be a big enabler for plenty of things. It comes up way more often than async drop or even async trait fn, even though those are still very much desired features.

1 Like

A few comments on the proposed list:

Lints for common “gotchas”, like #[must_use] to help identify “not yield safe” types.

This would be great! Currently types not being Send has taken the place of such a lint, but it doesn't work in futures executed by block_on. This also makes me think about the thread::sleep function, which new users regularly call in their asynchronous code.

Extend the stdlib with mutexes, channels, task::block_on , and other small utilities.

Some of these, especially block_on, can cause trouble if we don't first find a better answer to the executor agnostic question. The block_on in futures is already a footgun when used together with Tokio (see here). Regarding channels, keep cooperative preemption in mind.

1 Like

I'm really happy with how this is all progressing. There are few things from the posts that I have concerns with:

  1. Adding an async-aware Mutex to the std library.

Fair async-aware mutexes are impossible in the current async model without tight coupling between the executor and the task using the mutex. The Mutex from the futures crate is extremely unfair to the point that I would really not advise using it: as soon as your executor becomes busy some tasks will never be able to acquire the mutex. On top of that, it is incompatible with certain future combinators which can result in the the mutex being held indefinitely by a future which is not being polled.

The reasons for this are subtle, but at a high level it's because when the mutex wakes a task, there is no guarantee that the mutex itself will be re-polled promptly, even if the task it wakes is re-polled.

IMO, a pre-requisite for an async-aware mutex is an executor-agnostic spawn mechanism: a Mutex in our async model is actually a task. When you create a Mutex which protects some state, you are actually spawning a task which owns that state. When you want to acquire the mutex and mutate its state, you are actually sending a "state modification" future to run within that mutex task. The mutex task guarantees that only one "state modification future" will be polled at once, and it naturally guarantees fairness, whilst ensuring that the mutex is unlocked promptly by directly driving its future to completion.

  1. Adding an async drop mechanism

The motivations for this feature seem entirely inadequate given the increase in complexity to the language and the implicit "invisible yield points" it introduces:

  • It makes control-flow an order-of-magnitude more complicated compared to the addition of the ? feature.
  • You can achieve the same result by explicitly calling a complete().await method.
  • You still need a "normal" destructor in case the the task itself is dropped or the value is used in a non-async context.
  • You now have to consider the case where a normal destructor runs because a task was dropped half-way through running an async destructor.
  • Destructors are already extremely hard to reason about and adding more complexity to this part of the language seems like a terrible idea. For example, the documentation for ManuallyDrop was presiumably written by someone very knowledgeable about the language, and yet nobody noticed the leak-amplification bug in its only example: https://doc.rust-lang.org/std/mem/struct.ManuallyDrop.html#examples.
  • It's an exremely niche feature that should only be used by a small number of types: just enough to catch you out with some horrible to track-down bugs. If we had another "obfuscated rust completition" async destructors seem like a prime candidate for obfuscating what code actually does.
1 Like

In how far do you consider the current Mutex implementations in futures-intrusive and tokio as unfair? They all maintain a fair queue of tasks waiting to acquire the Mutex. They all are not coupled to an executor.

I agree on Async Drop as proposed so far not being ideal, because it doesn't provide any strong guarantees about whether that new poll_drop function will be called after all. Thereby it's not possible to rely on it to avoid any critical leaks. So as you say, it adds complexity without really solving an existing problem.

I had been working on an alternate RFC about adding support for run-to-completion async functions. This approach can provide stronger guarantees of functions being called (incl. any use defined async fn complete()), and thereby also solves some of the problems that poll_drop aimed to solve. It however also won't be without downsides (mostly in terms of new interoperability challenges).

I've only looked at Tokio's Mutex, and while it's fair, it is not technically compatible with Rust's async model: there is currently no guarantee that a future be either polled or dropped promptly, even if the task it belongs to is woken up.

Futures are lazy: if you do not need the result of a future, there is no obligation to poll it. This means that polling or dropping a future should not have certain kinds of side effects, otherwise the behaviour will be unpredictable as it will depend on the way the future is used. In the case of the futures Mutex, only the unlocking side has side-effects, so it suffers from the issue that tasks might inadvertantly fail to release the mutex after they've acquired it.

In the case of the tokio Mutex, both locking and unlocking have side effects, so you have the same problem as above, but also if you ever try to lock the mutex and then later decide you don't need to, you might still prevent other tasks from acquiring it in the first place.

The reason you need tight-coupling with the executor is because you want code accessing a mutex to progress, even if the "result" that code will produce is no longer needed. To achieve that, you need to spawn the future as a task so that it will be eagerly polled.

2 Likes

I'd like to learn more about the Sink trait. I don't quite understand the role it plays yet -- it hasn't come up much in conversations. I mean I get that it's an "async consumer" (whereas a Stream is the "producer"), but I'd like to see some examples of libraries or patterns that would make use of taking a generic Sink.

1 Like

I can give some examples off what I use it for:

  • in the actor lib I'm writing, Addr needs to send the messages to Mailbox, which lives in a different task. So I need to use a channel. However, I don't really want to choose the channel impl. Channels have different performance profiles (eg. good under contention, or without contention, guaranteed FIFO, best effort FIFO, etc), and any day a new implementation of channels can come out that might be preferable over older ones.

    People might also want to use a channel with different properties, like a drop channel that overwrites older messages rather than provide back pressure, bounded vs unbounded etc.

    So I have a builder pattern which let's the user pass in an mpsc channel:

    /// Interface for T: Sink + Clone. Has blanket impl.
    //
    pub trait CloneSink<'a, Item, E>: Sink<Item, Error=E> + Unpin + Send
    
    impl<'a, T, Item, E> CloneSink<'a, Item, E> for T
    
       where T: 'a + Sink<Item, Error=E> + Clone + Unpin + Send + ?Sized
     
    {
        fn clone_sink( &self ) -> Box< dyn CloneSink<'a, Item, E> + 'a >
        {
            Box::new( self.clone() )
        }
    }
    
    /// Type of boxed channel sender for Addr.
    //
    pub type ChanSender<A> = Box< dyn CloneSink< 'static, BoxEnvelope<A>, SinkError> >;
    
    /// Type of boxed channel receiver for Inbox.
    //
    pub type ChanReceiver<A> = Box< dyn Stream<Item=BoxEnvelope<A>> + Send + Unpin >;
    
    // in the builder:
    //
    pub fn channel( &mut self, tx: ChanSender<A>, rx: ChanReceiver<A> ) -> &mut Self
    

    Now I support all of those use cases out of the box. Well, that is if people implement Sink on their channel senders, but in worst case we can wrap them.

  • I implement it often on my public types. Addr has two sending modes, call which will return a future that resolves to the return type of the call, and send which is like throwing a bottle in the sea, no feedback other than the msg has been delivered to the mailbox. For send, I just implement Sink. This enables use like:

    exec.spawn( async move { my_stream_of_msgs.forward( addr ).await } );
    

    It composes nicely with Stream through combinators and let's you chain things together. Here my_stream_of_msgs maybe is an rx of a channel.

One suggestion that I have seen around is: instead of implementing Sink, expose an API that takes a Stream. However it no longer really composes. I now have to spawn a new task for this internally, and everyone that wants to send a single item must always wrap it in a stream, when they might have just done addr.send( item ).await; before, getting some back pressure in the process.

edit: for the striketrough, I suppose it's possible not to spawn this but wrap the operation in a future that is returned to the caller by taking self by value and returning it, so maybe it's possible to make this suggestion work with not to much churn. I'll have to experiment with it a bit.

1 Like

The Sink trait is indeed the "opposite" of Stream. Most apis I see where it could make sense to take a Sink take a stream instead. A classic example is the body provided when using the hyper crate. Sometimes it's nice to be able to write some async/await code that can prepare some chunks of data and then send them off in a Sink, but the api takes a Stream instead. Hyper provides a channel body which essentially turns it into that Sink workflow.

One reason Sink might come up less in conversations (especially those about which apis to stabilize) is that the current Sink trait needs more thought into the design of the trait. You have to first call poll_ready and then start_send, which nothing in the types enforce. Additionally the type must be able to "reserve a slot" in some sense, which often doesn't map nicely onto the types you might want to be a sink. You have to design your channels to first reserve a slot and then later perform the actual send.

This is really interesting! To be honest, I hadn't looked that closely at mutexes in particular. It'd be good to try and spell out in more detail the various strategies one can take towards mutexes and list out their advantages and disadvantages. Is there such a write-up? I'd like to see a clear elaboration on what can go wrong, for example.

(One of the things I want to be doing is organizing a better effort around async to spell out the "skill tree" as well as some of the design constraints around each piece. It's something I had hope to do with the data from the async interviews but it's been hard to find the time.)

Right, this is the approach that Hyper (and actually all of the HTTP crates right now) is/are taking. However using a Stream instead of a Sink here (or more general: Using 2 read interfaces instead of a read and write interface) is not ideal, since it prevents applications to observe the outcome of HTTP requests, as I described here: https://github.com/hyperium/hyper/issues/2181

This seems like a minor API difference and question of taste at first, but it unfortunately makes Rusts HTTP ecosystem currently hardly recommendable in environments which need strong visibility about the outcome of HTTP requests (like CDN and storage services).

2 Likes

I wrote up some more detail: https://github.com/Diggsey/posts/tree/master/async-mutexes (everything from the section on fairness to the end relates to my previous comments)

6 Likes

Thanks @Diggsey! That's a great description of the problem - and I love the visualizations!

The Tokio and futures-intrusive fair mutex types indeed work like you described - which means if an application stops polling the generated Futures before they completed some concurrent tasks might be starved, due to the Mutex having been acquired in the background when another task released it.

We had some discussions around these issues before, with @Nemo157 for example. I think so far the common understanding was not continuing to poll() things before completion is considered an application bug. Even if you have select!, the typical flow will will either continue to poll() the Future (if a loop is used) or the Future will get dropped.

But it wouldn't be surprising if some unexpected bugs sneak into applications this way.
There are a couple of other issues with "intra-task-concurrency" due to join! or select! too - e.g. if one of the child-futures blocks the thread (e.g. with thread::sleep or tokios block_in_place), the other child Futures will be starved. There is not a lot we can do besides adding documentation around the behavior.

However more interesting than just looking a Futures which might not be polled to completion is actually looking at Streams -> For those there is not necessarily a completion state.

Since a Stream could wrap a Future which does not act ideal if not polled to completion (like an async Mutex), then stopping to poll the Stream causes the same issue. Here the only remedy is to close/drop the whole Stream.

We had a bit of discussion in this CR, which builds a Stream based of Futures that require getting polled to completion: When the Stream is polled initially, it will create and store a Future in itself, and from there on the inner Future is polled. If the Stream is not continued to be polled, the Future will just stick around.

For Streams and other poll_x types AsyncRead/Write the contract is likely "you have to continue polling the object until it returned Poll::Ready. If Poll::Pending was returned in the last poll, then stopping to poll might cause [not memory related] undefined behavior.

One API solution that could be added to those types is having some kind of cancel() method which stops and in-progress operation. That one would need to be called if an application does not intend to poll the object anymore. But that new API would raise some new questions. E.g. whether one would need to poll after the cancellation had been initiated, since the cancellation might be async.

Thanks, that is exactly what I was hoping for and then some!

Async interview #8, with @stjepang, is now available.

6 Likes

@nikomatsakis Typo s/Bridging the sync vs sync worlds/Bridging the sync vs async worlds/.

http://smallcultfollowing.com/babysteps/blog/2020/07/09/async-interview-8-stjepan-glavina/#bridging-the-sync-vs-sync-worlds

1 Like

Thanks for the great series.

One thing I don't really understand in regard to smol: why is it that a true async file IO wrapper can't be provided? I agree that in many cases deferring to another thread to handle it would be sufficient, but I can also think of cases where I am not so sure.

A case I am particularly interested in is where data is being read in from a large number (thousands) of network connections and written to files (one per connection). Will the smol way of handling files scale to this level? Or will there be too much overhead? Or maybe the writer threads will choke on the workload?

I admit I have not looked closely enough at smol to know how easy it would be to write an Async wrapper for a file, but I would have thought it should be possible for any type that implements AsRawFd.

1 Like

It's not specific to smol. The APIs that various executors use to make IO async just don't work for files. For example, on Linux the API is called epoll and works by letting you choose a list of connections to receive events from, and then allows you to go to sleep until an event from any of those occurs. You can provide a file to this list, but epoll will claim that it is always ready, even if reading from it would block.

As an exception to this, there does exist an API called io_uring that exists on very new Linux machines, which does provide true file IO, but supporting it in a runtime has proved difficult, and no runtimes currently support it.

4 Likes