Hi, looking at the Tokio's AsyncRead
trait, I'm wondering how to use it at scale.
I had to build a Websocket server and wondered how efficient it was. I remembered a golang talk about handling milions of websockets and how not having one buffer per connection tremendously reduces the memory footprint of the server.
From what I understand basic use of something implementing AyncWrite
would look like this:
use tokio::io::AsyncReadExt;
async {
let reader: impl AsynRead = ...;
let mut buf = vec![0; buf_size];
let written = reader.read(&mut buf).await?;
// Do something with buf[..written]
}
The issue is that if you have a large number of such tasks (say n
), you now have n*buf_size
memory allocations. So either you make buf_size
really small, which would be detrimental to performance when data arrives or you need a lot of ram.
However, given the way AsyncRead
is designed, it should accept the buffer changing on each poll_read
call, so we could very well have in AsyncReadExt
a read_pooled<'a>(&'a mut self, &'a Pool) -> ReadPooled<
a, Self>`.
ReadPooled
would implement Future<Output = io::Result<Vec<u8>>>
(or with one of bytes
's structures) with something like this:
impl<S: AsyncRead> Future for ReadPooled<'_,S> {
fn poll(self: Pin<&mut self>, cx: &mut Context<'_>) -> Poll<io::Result<Vec<u8>>> {
// Borrow a reader from the Pool.
// Depending on the pool, this gives some wrapper type around a form mutable reference to a buffer, maybe a `MutexGuard<Vec<u8>>` or something
// On drop it is returned to the pool for reuse in the next `poll` call of this Future or any other that uses the same pool
let buf = self.pool.borrow();
match self.stream.poll_read(&mut buf) {
Pending => return Pending, //The buffer is returned to the pool
Ready(Err(e)) => return Ready(Err(e)),
Ready(Ok(()) => return Ready(Ok(buf.take())) // buf.take takes ownership of the buffer away from the Pool (which will likely just reallocate a new buffer to replace it).
}
}
}
There are some subtleties with Pin
which I ignored but I don't think would be a problem. The idea is that you would use the same Pool
in many tasks. The pool itself can do with only a few buffers (maybe as many as you have executor threads), given that very few tasks will be in poll
at the same time.
In my case with websockets, tokio-tungstenite
doesn't seem to do anything like this (from what I understand it uses 4096
bytes read buffers for each socket) so I expect memory usage to be quite high with a lot of websocket connections open.
I don't seem to find anything for this kind of pattern. You'd probably need a type (or trait) for Pool
. While there had been mentions of pooling for the bytes
crate, nothing seems to have come out of it. Is there something I'm missing on why would such a pattern not be used for cases where a lot of parallel connections are used concurrently?