Zero-copy async IO in Rust?

In C++ iostreams are designed very poorly and as a result they create many copies of data and extremely slow (and require 2x-4x of memory they could require!).

The number of copies any IO code performs is the only really important thing.

I thought that in Rust, tokio::io::AsyncWrite / tokio::io::File are zero-copy.

But it seems it is not:

  • From Alice:

    When given bytes, it immediately returns Ok(len) and copies them into a buffer

  • The code corroborates it:

    impl AsyncWrite for File {
        fn poll_write(
            self: Pin<&mut Self>,
            cx: &mut Context<'_>,
            src: &[u8],
            ...
                        let n = buf.copy_from(src);
    
  • But there is more: I think there is also design flaw in AsyncWrite: it seems impossible to pass its buffer to an async function: see my question Pass buf: &[u8] from AsyncWrite::poll_write to an async function. If this is true, it means that ANY implementation of tokio::io::AsyncWrite is flawed.

Can please someone tell me that I am wrong, and there is a way to create a zero-copy impl of AsyncWrite (using the existing interface)?!!

If not, are there plans to replace/supersede tokio::io::AsyncWrite? What are some alternatives?

cc: @alice any ideas / suggestions are appreciated

The AsyncRead/AsyncWrite traits work perfectly well for most IO resources. With tokio::net::TcpStream and other similar types, there are no copies that can be eliminated.

You are right that this isn't the case for tokio::fs::File, but that's a file-specific problem. See from the Tokio tutorial:

When not to use Tokio

[...]

Reading a lot of files. Although it seems like Tokio would be useful for projects that simply need to read a lot of files, Tokio provides no advantage here compared to an ordinary threadpool. This is because operating systems generally do not provide asynchronous file APIs.

source

3 Likes

Then I'm curious, what about io_uring? That's designed to be async.

As an aside, it's quite remarkable that Tokio hasn't rebased to use that natively yet.

2 Likes

This problem is not specific to AsyncWrite: it is fundamentally impossible to do async IO from/to &[u8]/&mut [u8] without copies in Rust.

The reason is that any such operation must be able to read/write from the buffer while the poll/poll_write function is not running, but in that situation it's impossible to guarantee the references will still be valid. It may in fact happen that the Future/AsyncWrite/AsyncRead is forgotten, in which case the reference may become invalid before any code can be run to stop the IO operation.

Any alternative need to give up something, for example the ability to use &[u8]/&mut [u8] like the tokio_uring crate does (it requires you to give it ownership of the buffer and then it gives it back when the operation is done).

5 Likes

It looks like tokio::net::TcpStream just delegates write to syscall, forwarding buffer through a number of "regular" non-async functions, and ultimately calls this one:

    pub fn write(&self, buf: &[u8]) -> io::Result<usize> {
        let len = cmp::min(buf.len(), <wrlen_t>::MAX as usize) as wrlen_t;
        let ret = cvt(unsafe {
            c::send(self.inner.as_raw(), buf.as_ptr() as *const c_void, len, MSG_NOSIGNAL)
        })?;
        Ok(ret as usize)
    }

But what if cannot have THAT simple implementation?

What I am trying to do is use an async function from within poll_write (and I need this because I want to upload data to s3 using AWS SDK, which is async).

What would you suggest if I need (1) composable (2) zero-copy (3) async IO?
("composable" in the sense that I use other async function, while implementing pol_XXX)

Rust's async model has fundamental issues with completion-based APIs (this is one of the many reasons why I steer as far as possible from async Rust). There are certain workarounds/hacks like using io-uring's poll mode or registered buffers, but you simply can not safely use buffers which are part of a future state for completion-based async IO.

4 Likes

Yeah, it sounds like a good explanation, thank you.

impossible to do async IO from/to &[u8]/&mut [u8] without copies in Rust.

Which means that if we change the signature of poll_write to accept bytes::Bytes, the problem can be solved?..
(as far as I understand bytes is just implicitly shared vector with cheap copy)

Yes, that's kinda what tokio_uring/monoio do (they make it generic to allow Vec<u8> and custom types too).

2 Likes

Nice!

Then why doesn't @alice want to change tokio? :slight_smile:

Again, as far as I understand, there is no way to mix async function with poll_write (without copying arguments).

tokio_uring's DESIGN.md document explains it:

Because io-uring differs significantly from epoll, Tokio must provide a new set of APIs to take full advantage of the reduced overhead. However, Tokio's stability guarantee means Tokio APIs cannot change until 2024 at the earliest.

2 Likes

It is not ONLY about uring. I came to this issue/problem from a completely different perspective (composing async functions and poll_write).
My initial question: Pass buf: &[u8] from AsyncWrite::poll_write to an async function (and await it)

1 Like

Is there an alternative async model that would not have these issues?

That still requires changing the AsyncWrite and AsyncRead traits in a non-backward compatible way, so the same stability argument applies.

Sounds like this year could be a big one for Rust async then, aside from improvements to the language proper that are sure to come.

1 Like

Yes, 100%, but if we leave it as it is, it will be "std::iostream 2" (which exists in the C++ standard, but no one is using it).

The sooner we embrace it and change it, the better.
Otherwise it will be like in C++: everyone will have its own "AsyncWrite"

The easiest way to implement your own IO types is to implement something based on the Stream/Sink traits and to convert that into an AsyncRead/AsyncWrite using the utilities in the tokio-util crate.

You all are talking so fast, it's hard to keep up. I don't really know what to reply to with so many intertwined topics. :sweat_smile:

2 Likes

TBF C++'s iostream has lot of other problems too, it's not just the performance implications.

Ideally we would have a single AsyncWrite/AsyncRead in the stdilb, but that needs to be "correct" the first time, otherwise it will be impossible to change later (which is the same problem that std::iostream has, it is impossible to change due to C++ stdlib's stability guarantees).

Ultimately, I don't think there can be one "perfect" api. In some sense, the one we have today is the most powerful API for the user of the IO resource. The buffer with data to write only needs to be available when you actively calling poll_write, which allows for cases such as:

  • Writing a buffer on the stack of the poll function calling poll_write.
  • Writing a buffer stored in a field next to the AsyncWrite without making the struct self-referential.

And the current API also has the strongest possible cancellation safety guarantees.

But it's not the most powerful API for the implementer of the IO resource, as your discussion has revealed.

7 Likes

I see...

I guess the short-term plan looks like this:

  • either perform a copy in poll_write or
  • "implement something based on the Stream /Sink traits and to convert that into an AsyncRead /AsyncWrite using the utilities in the tokio-util crate" (I hope this approach does not involve copies?)

But what about long-term?

  • Shall I create my own AsyncWrite that would take something like bytes::Bytes (I guess it should fix the issue?)
  • Or are there plans to address the issue in tokio (in which case I can wait/contribute)?

Can you tell me more about this IO resource that you are trying to implement. What will your writes do? You mentioned something about aws. And how will it be used? Are you passing it to some API that takes an AsyncWrite?