Avoiding heap allocations in (some) async trait methods

Hello, I plan on using a trait to define an abstract interface to a data storage, where several implementations may exist

Since the interface will involve I/O, and because I want to use async Rust, I have been considering to use the async-trait crate. However, some methods might be very cheap and called repeatedly.

I understand that async-trait imposes some overhead at runtime due to type-erasure / dynamic dispatch and heap allocation. To allow optimizations, I'd like to define my trait in such a way that some implementations may work without dynamic futures. To do this, I declared the trait as follows (simplified example):

use std::future::Future;
use std::pin::Pin;

trait GetAnInteger {
    type GetFuture: Future<Output = i32>;
    fn get(&self) -> Self::GetFuture;
}

Using an associated type, I can decide for each implementation whether I want to use a Pin<Box<dyn Future<…>…>> or some other future which doesn't require any allocations on the heap.

Question 1: If the type GetAnInteger::GetFuture is only restricted to be a Future<Output = i32>, wouldn't it be possible to return a future that is !Unpin and thus cannot be polled without using unsafe code, as poll works on Pin<&mut self>? Do I need to restrict GetFuture to be Unpin? (So far, the compiler didn't complain.)

Edit: I assume that I don't need Unpin (and shouldn't demand it here), because I can always pin an owned value/future using tokio::pin! (or similar macros, which internally use unsafe but do not require the caller to deal with unsafe code).

Let's take a look at an efficient implementation:

struct EfficientGetAnInteger {
    value: i32,
}

impl GetAnInteger for EfficientGetAnInteger {
    type GetFuture = std::future::Ready<i32>;
    fn get(&self) -> Self::GetFuture {
        std::future::ready(self.value)
    }
}

EfficientGetAnInteger doesn't really need to do complicated stuff and simply uses Ready to satisfy the asynchrnonous interface. No heap allocations or dynamic dispatch needed.

But for other implementations of the trait, things might get more complex, and it might not be easy to explicitly name the future that is returned. In that case, I can do the following:

struct DynGetAnInteger {
    value: i32,
}

async fn some_async_stuff() {
    println!("Let's pretend we do some async stuff here.");
}

impl GetAnInteger for DynGetAnInteger {
    type GetFuture = Pin<Box<dyn Future<Output = i32> + Send>>;
    fn get(&self) -> Self::GetFuture {
        let retval: i32 = self.value;
        Box::pin(async move {
            // code might be added here in future
            some_async_stuff().await;
            // code might be added here in future
            retval
        })
    }
}

Here, I don't need to explicitly name the type of the future created by the async move {…} block because I declare the return type of the get method (associated type GetFuture) to be a Pin<Box<dyn Future<…>…>>, which is the same that async-trait does, I assume.

So far, my program works fine (here with the tokio runtime):

#[tokio::main]
async fn main() {
    let getter1 = EfficientGetAnInteger { value: 17 };
    println!("Got value: {}", getter1.get().await);
    let getter2 = DynGetAnInteger { value: 18 };
    println!("Got value: {}", getter2.get().await);
}

Question 2: Is my approach the way to go with the current support of async in Rust? Or is there an easier way? It seems like I have to declare an associated type for each async method where I want a highly performant implementation to be possible. That seems very verbose, but I guess there is no other way?

1 Like

Indeed, to get a type other than a boxed future, you can't use async/await to define the future, and you must write fully manually with poll functions.

1 Like

Thanks for reassuring me here that a lot of manual work isn't "wrong" (considering the current state of the art).

I also had a look at Tokio's async I/O traits.

Thinking this through further, I can think of four different ways to declare an async method:

  • using async-trait or a similar approach (when I'm lazy or things get too complex, and the extra runtime overhead isn't a big deal),
  • returning Poll,
  • returning a particular type implementing Future (i.e. the type of the returned future will be unchangeable),
  • returning an associated type implementing Future (i.e. each implementator of the trait can either use their own type, or a less efficient trait object such as Pin<Box<dyn Future<…>…>>).

All this wasn't clear to me after reading Asynchronous Programming in Rust or the Tokio Tutorial. I have also been stumbling upon posts of people who were surprised that there is more than one way to write an async function and that async and await isn't the only way.

Maybe it would be wise to keep this fact in mind when writing / working on tutorials in this matter and give an overview of the different approaches. It may be a matter of course for those who are experienced with Rust, but it can confuse a newcomer easily (such as me :sweat_smile:).

I hope it's okay if I give that sort of feedback. I don't want to sound too critical, and Rust is doing a very good job! I just find many things difficult to get a grasp on, and maybe the feedback of someone with less insight can be helpful.

1 Like

This is currently not generally doable in stable Rust, but using the unstable min_type_alias_impl_trait feature, one gets to define neameable existential types with which to fill the associated type requirements. See this crate for an automated proof-of-concept of it:

2 Likes

Fascinating! I just tried to manually use it:

#![feature(type_alias_impl_trait)]
impl GetAnInteger for DynGetAnInteger {
    type GetFuture = impl Future<Output = i32> + Send;
    fn get(&self) -> Self::GetFuture {
        let retval: i32 = self.value;
        // no heap allocation needed here!!
        async move {
            some_async_stuff().await;
            retval
        }
    }
}

I will certainly look into real-async-trait to simplify the syntax. As my current work is highly experimental as well, I don't mind using unstable or experimental features :smiley:.

Addendum: Maybe real-async-trait is too experimental for me :sweat_smile:. It's limitations regarding lifetimes seem prone to get me into trouble soon. Also, I quickly ran into an error when trying this:

#![feature(generic_associated_types)]
#![feature(type_alias_impl_trait)]

use real_async_trait::real_async_trait;

#[real_async_trait]
trait X {
    async fn nop<'a>(&'a self) -> ();
}

struct T {}

#[real_async_trait]
impl X for T {
    async fn nop<'a>(&'a self) -> () {
        ()
    }
}

// until here it works fine,
// but let's try again:

struct T2 {}

#[real_async_trait]
impl X for T2 {
    async fn nop<'a>(&'a self) -> () {
        ()
    }
}

Implementing X for T2 raises an error:

error[E0428]: the name `__real_async_trait_impl` is defined multiple times
  --> src/main.rs:22:1
   |
13 | #[real_async_trait]
   | ------------------- previous definition of the module `__real_async_trait_impl` here
...
22 | #[real_async_trait]
   | ^^^^^^^^^^^^^^^^^^^ `__real_async_trait_impl` redefined here
   |
   = note: `__real_async_trait_impl` must be defined only once in the type namespace of this module
   = note: this error originates in the attribute macro `real_async_trait` (in Nightly builds, run with -Z macro-backtrace for more info)

I guess it was never tested with multiple trait implementations per module. I should file a bug report.

Anyway, it is an interesting approach and maybe I'll manually use type_alias_impl_trait and associated types to make some methods return a named existential type instead of a dynamic (boxed) one. I'd still be stuck with the overhead of defining an associated type for each async method though.

1 Like

Indeed.

In the meantime, you can do:

const _: () = {
    #[real_async_trait]
    impl X for T2 {
        …
    }
};

(which you can suggest in the bug report)

I filed a bug report and provided your workaround.

1 Like

Submitted a PR featuring the fix; feel free to try it by adding the following to your (workspace's) Cargo.toml:

[patch.crates-io.real-async-trait]
git = "https://github.com/danielhenrymantilla/real-async-trait-rs"
branch = "patch-1"
1 Like

I played around a bit more, and came up with this:

#![feature(type_alias_impl_trait)]
#![feature(generic_associated_types)]
use std::future::Future;

trait GetAnInteger {
    type GetFuture<'a>: Future<Output = i32> + 'a; // NOTE: Could add `+ Sync` here
    fn get<'ret, 'a: 'ret, 'b: 'ret>(&'a self, offset: &'b i32) -> Self::GetFuture<'ret>;
}

struct ManualGetAnInteger {
    value: i32,
}

impl GetAnInteger for ManualGetAnInteger {
    type GetFuture<'a> = std::future::Ready<i32>;
    fn get<'ret, 'a: 'ret, 'b: 'ret>(&self, offset: &i32) -> Self::GetFuture<'ret> {
        std::future::ready(self.value + *offset)
    }
}

struct AutomaticGetAnInteger {
    value: i32,
}

async fn some_async_stuff() {
    println!("Let's pretend we do some async stuff here.");
}

impl GetAnInteger for AutomaticGetAnInteger {
    type GetFuture<'a> = impl Future<Output = i32> + 'a + Send;
    fn get<'ret, 'a: 'ret, 'b: 'ret>(&'a self, offset: &'b i32) -> Self::GetFuture<'ret> {
        async {
            // code might be added here in future
            some_async_stuff().await;
            // code might be added here in future
            self.value + *offset
        }
    }
}

#[tokio::main]
async fn main() {
    let getter1 = ManualGetAnInteger { value: 17 };
    println!("Got value: {}", getter1.get(&100).await);
    let getter2 = AutomaticGetAnInteger { value: 18 };
    println!("Got value: {}", getter2.get(&200).await);
}

Output is:

Got value: 117
Let's pretend we do some async stuff here.
Got value: 218

Opposed to real-async-trait, notice I use different lifetimes for &self and the extra argument to the get function. That is not supported by real-async-trait, as explained in its docs:

there can only be a single lifetime in use simultaneously. I have no idea why, but it could be due to buggy interaction between existential types and generic associated types;

Maybe my example only works because i32 is Copy, and perhaps more complex examples would fail, but it seems like a way to avoid returning a boxed future while not having to manually define which future to return.

Of course, there is still a lot of noise to type, such as type GetFuture<'a> = impl Future<Output = i32> + 'a + Send; plus the extra lifetime annotations (which real-async-trait also requires, btw.). On the pro side, I have full control about which bounds I want my future to fulfil (both in the trait definition and in each implementation of the trait).

Yeah, your lifetime signature is more complete, but won't be enough for multi-lifetimes types (when they can't all be shrunk-unified into one). If you want to encounter the limitation, try unsugaring something like:

async fn foo<'a, 'b> (it: &'a mut &'b ())
{
    drop(it);
}

The only correct signature of it would be:

type FooRet<'args, 'a, 'b> = impl 'args + Future<Output = ()>;
fn foo<'args, 'a, 'b> (it: &'a mut &'b ())
  -> FooRet<'args, 'a, 'b>
where
    'a : 'args,
    'b : 'args,
{
    async move {
        drop(it);
    }
}

I get:

error[E0477]: the type `impl Future` does not fulfill the required lifetime
 --> src/main.rs:5:30
  |
5 | type FooRet<'args, 'a, 'b> = impl 'args + Future<Output = ()>;
  |                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |
note: type must outlive the lifetime `'args` as defined on the item at 5:13
 --> src/main.rs:5:13
  |
5 | type FooRet<'args, 'a, 'b> = impl 'args + Future<Output = ()>;
  |             ^^^^^

But why? Is this due to an incomplete compiler feature?

Why do you include 'a and 'b as lifetime arguments to FooRet? I tried this:

type FooRet<'args> = impl 'args + Future<Output = ()>;
fn foo<'args, 'a, 'b> (it: &'a mut &'b ())
  -> FooRet<'args>
where
    'a : 'args,
    'b : 'args,
{
    async move {
        drop(it);
    }
}

Then I get the same error plus an additional error as follows:

error[E0700]: hidden type for `impl Trait` captures lifetime that does not appear in bounds
 --> src/main.rs:7:6
  |
7 |   -> FooRet<'args>
  |      ^^^^^^^^^^^^^
  |
note: hidden type `impl Future` captures the lifetime `'b` as defined on the function body at 6:19
 --> src/main.rs:6:19
  |
6 | fn foo<'args, 'a, 'b> (it: &'a mut &'b ())
  |                   ^^

But when I change it: &'a mut &'b () to it: &'a (), it2: &'b (), then everything works fine. Why? :exploding_head:

The story of lifetimes in impl Trait return types is a bit complicated.

1951-expand-impl-trait - The Rust RFC Book

The reason why &'a mut &'b … won’t work, but (&'a …, &'b …) or &'a &'b … will is that in the latter case(s), the lifetimes 'a and 'b can be coerced to 'args by covariance before the reference becomes part of that impl 'args + Future<…> return type. OTOT, &'a mut &'b … is invariant in 'b. The best general workaround is to introduce a new trait

trait CapturesLifetime<'a> {}
impl<T: ?Sized> CapturesLifetime<'_> for T {}

and then just list all of your lifetimes as extra trait bounds:

type FooRet<'a, 'b> = impl CapturesLifetime<'a> + CapturesLifetime<'b> + Future<Output = ()>;
fn foo<'a, 'b> (it: &'a mut &'b ()) -> FooRet<'a, 'b> {
    async move {
        drop(it);
    }
}

This also gets rid of the need for introducing an additional 'args lifetime.

2 Likes

Sorry, the 'a : 'args kind of bounds need to be propagated to the existential as well:

type FooRet<'args, 'a : 'args, 'b : 'args> = impl 'args + Future…;

That being said,

from there we can even try and skip the CapturesLifetime<'lt> hack there (thanks to the intro-generics, they shouldn't be necessary):

type FooRet<'a, 'b> = impl Future<Output = ()>;

The only remaining question being the "outlives lifetime": it is elided here, and to my surprise (I assumed impl … without any lifetimes mentioned there would have stood for impl 'static + …, which it doesn't), this snippet Just Works™.

So feel free to go with that simpler(-because-more-implicit) example. But know that its semantics are equivalent to those featuring 'args, the "intersection" lifetime.

1 Like

Actually… I’m just noticing the method with the impl CapturesLifetime<'a> + CapturesLifetime<'b> + Future<Output = ()> is only necessary if you don’t use a type_alias_impl_trait type. I.e. it’s useful on stable rust when directly writing the signature as fn foo<'a, 'b>(it: &'a mut &'b ()) -> impl …. With type_alias_impl_trait, the lifetime arguments are listed explicitly anyways, so you just need to do

type FooRet<'a, 'b> = impl Future<Output = ()>;
fn foo<'a, 'b> (it: &'a mut &'b ())
  -> FooRet<'a, 'b>

{
    async move {
        drop(it);
    }
}

No need for 'args lifetime or any other workarounds at all.

Edit: Oh, I should’ve read @Yandros’s prevous comment completely, first :sweat_smile:

1 Like

We reached the same observation almost at the same time :grinning_face_with_smiling_eyes:

1 Like

Yeah, impl without lifetimes is not the same as 'static. The rules are different than the rules for dyn for example. And they’re also in some ways weird and inconsistent/buggy, etc… read for example the discussion in this thread

(One observation in this thread: An impl 'a + … lifetime is implemented in a way that checks that the actual returned type T does fulfill T: 'a; but the type alias type does not fullfill that bound necessarily. If the -> impl … in the function signature mentions other lifetimes 'l that don’t fulfill 'l: 'a, that’s somehow fine, as long as what’s actually returned doesn’t use that lifetime. Wait… you took part in this discussion, I just wanted to link this example; who am I telling this :see_no_evil: )


Not really… I read the first half of your answer (the part about FooRet<'args, 'a : 'args, 'b : 'args>), tested that, realized that for type_alias_impl_trait all the listed arguments count anyways, and then wrote my answer without even checking the remainder of yours.

1 Like

Now that's a lot for me to digest. The term "covariance" is new to me, and I just had to look it up in the Rustonomicon. I'll need some time to process all that. I really have to read (and understand) more about variance first.

With those changes it works here!

That also works for me, and is much shorter. I also assumed if I do not mention any lifetime it would be 'static, but it isn't. So that is good to know.

But when I combine it with my trait example, it does not work:

#![feature(type_alias_impl_trait)]
#![feature(generic_associated_types)]
use std::future::Future;

trait GetAnInteger {
    type GetFuture<'a, 'b>: Future<Output = i32>; // NOTE: Could add `+ Sync` here
    fn get<'a, 'b>(&'a self, offset: &'b i32) -> Self::GetFuture<'a, 'b>;
}

struct ManualGetAnInteger {
    value: i32,
}

impl GetAnInteger for ManualGetAnInteger {
    type GetFuture<'a, 'b> = std::future::Ready<i32>;
    fn get<'a, 'b>(&'a self, offset: &'b i32) -> Self::GetFuture<'a, 'b> {
        std::future::ready(self.value + *offset)
    }
}

struct AutomaticGetAnInteger {
    value: i32,
}

async fn some_async_stuff() {
    println!("Let's pretend we do some async stuff here.");
}

impl GetAnInteger for AutomaticGetAnInteger {
    type GetFuture<'a, 'b> = impl Future<Output = i32> + Send;
    fn get<'a, 'b>(&'a self, offset: &'b i32) -> Self::GetFuture<'a, 'b> {
        async {
            // code might be added here in future
            some_async_stuff().await;
            // code might be added here in future
            self.value + *offset
        }
    }
}

#[tokio::main]
async fn main() {
    let getter1 = ManualGetAnInteger { value: 17 };
    println!("Got value: {}", getter1.get(&100).await);
    let getter2 = AutomaticGetAnInteger { value: 18 };
    println!("Got value: {}", getter2.get(&200).await);
}

I get the following error:

error[E0700]: hidden type for `impl Trait` captures lifetime that does not appear in bounds
  --> src/main.rs:30:30
   |
30 |     type GetFuture<'a, 'b> = impl Future<Output = i32> + Send;
   |                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
note: hidden type `impl Future` captures lifetime smaller than the function body
  --> src/main.rs:30:30
   |
30 |     type GetFuture<'a, 'b> = impl Future<Output = i32> + Send;
   |                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

By the way, my rustc version is rustc 1.57.0-nightly (485ced56b 2021-10-07).

Do you get that with async move too?

2 Likes

Note that my explanation is off. I’m explaining why

use std::future::Future;

fn foo<'args, 'a, 'b> (it: &'a mut &'b ())
  -> impl 'args + Future<Output = ()>
where
    'a : 'args,
    'b : 'args,
{
    async move {
        drop(it);
    }
}

doesn’t work, not why

#![feature(type_alias_impl_trait)]

use std::future::Future;

type FooRet<'args, 'a, 'b> = impl 'args + Future<Output = ()>;
fn foo<'args, 'a, 'b> (it: &'a mut &'b ())
  -> FooRet<'args, 'a, 'b>
where
    'a : 'args,
    'b : 'args,
{
    async move {
        drop(it);
    }
}

Doesn’t work. I’ve only noticed my confusion later. For why the latter doesn’t work, the answer from @Yandros is currect, just the 'a: 'args and 'b: 'args bounds are missing.

The former (where my explanation applies) also gives a different error message:

error[E0700]: hidden type for `impl Trait` captures lifetime that does not appear in bounds
 --> src/lib.rs:4:6
  |
4 |   -> impl 'args + Future<Output = ()>
  |      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |
note: hidden type `impl Future` captures the lifetime `'b` as defined on the function body at 3:19
 --> src/lib.rs:3:19
  |
3 | fn foo<'args, 'a, 'b> (it: &'a mut &'b ())
  |                   ^^

@Yandros With async move it works! Thanks to both of you for all that info. Will look more into that later.