I have a trait (let's call it BinaryRepr
) that allows serializing a data structure (&Self
) into a stream of bytes by writing it to an asynchronous writer (tokio::io::AsyncWrite
). I want this to work asynchronous because the data structures might be big and the receiver might be a remote network peer (and I don't want to buffer the whole binary representation in memory). Let's call the method that performs the serialization "dump
".
I also have the need to calculate hashes based on this binary representation. Let's call the method that performs the hashing "hash
".
As I dislike redundancy here, I don't want to implement code twice. But I would also like to be able to calculate a hash value from non-async functions. That should not be a problem as calculating the hash will not require any I/O and is fast in my use case.
However, I cannot define a function (dump
) that sometimes is async
and sometimes isn't.
I solved this by convention:
use async_trait::async_trait;
use futures::future::FutureExt;
use std::pin::Pin;
use std::task::{Context, Poll};
use tokio::io::{self, AsyncWrite, AsyncWriteExt};
use tokio::task::unconstrained;
#[async_trait]
trait BinaryRepr {
// CONVENTION:
// `dump` must only yield due to calling methods on `writer`
async fn dump<W: AsyncWrite + Unpin + Send>(
&self,
writer: &mut W,
) -> io::Result<()>;
fn hash(&self) -> u8 {
let mut hasher = Hasher::new();
unconstrained(self.dump(&mut hasher))
.now_or_never()
.unwrap()
.unwrap();
hasher.finalize()
}
}
Notice the now_or_never
method (originating from futures::future::FutureExt
) followed by two unwrap()
s, which panics at run-time unless dump
immediately returns with a success (which it, by convention, always will when using Hasher
as writer).
I used tokio::task::unconstrained
to avoid unnecessary yields (e.g. if dump
internally uses Tokio message queues or similar). (See also Async write into Vec, but note that by convention, this might be the responsibility of the implementation of dump
anyway, so could be left out here.)
I can then implement Hasher
, for example, as follows:
struct Hasher {
state: u8,
}
impl Hasher {
fn new() -> Self {
Hasher { state: 0u8 }
}
fn digest<T: AsRef<[u8]>>(&mut self, bytes: T) {
for byte in bytes.as_ref() {
self.state = self.state.wrapping_add(*byte);
}
}
fn finalize(self) -> u8 {
self.state
}
}
Now I must make Hasher
implementing AsyncWrite
in such a way that writes won't ever be pending:
impl AsyncWrite for Hasher {
fn poll_write(
self: Pin<&mut Self>,
_: &mut Context<'_>,
buf: &[u8],
) -> Poll<io::Result<usize>> {
Pin::into_inner(self).digest(buf);
Poll::Ready(Ok(buf.len()))
}
fn poll_flush(
self: Pin<&mut Self>,
_: &mut Context<'_>,
) -> Poll<io::Result<()>> {
Poll::Ready(Ok(()))
}
fn poll_shutdown(
self: Pin<&mut Self>,
_: &mut Context<'_>,
) -> Poll<io::Result<()>> {
Poll::Ready(Ok(()))
}
}
I can then use everything as follows:
struct Packet {
content: String,
}
#[async_trait]
impl BinaryRepr for Packet {
async fn dump<W>(
&self,
writer: &mut W,
) -> io::Result<()>
where
W: AsyncWrite + Unpin + Send,
{
writer
.write(
format!(
"Packet len={}\n",
self.content.len()
)
.as_ref(),
)
.await?;
writer.write(self.content.as_ref()).await?;
Ok(())
}
}
// NOTE: `main` isn't async here!
fn main() {
println!("Hash: {}", (Packet { content: "Hello".to_string() }).hash());
}
I would consider the dump
method being semi-asynchronous, because it will only yield if method calls to its argument (writer
) yield. Unfortunately, this isn't reflected in any way by its type – any implementation that violates this convention will lead to a run-time panic.
While I dislike the panics, I wasn't able to come up with any better solution for my problem. I would appreciate feedback on my idea. Is it a reasonable way to solve this? Am I thinking too complicated?