Avoiding heap allocations in (some) async trait methods

@Yandros I'm slowly trying to understand your post. I didn't get much farther than the first few paragraphs, but I think I understood a bit. I think I messed up a couple of times the direction of which lifetime must outlive which other lifetime. (The fact that the sub-type relationship is vice-versa added even more confusion.)

I guess what I should have tried to write is:

type FooRet<'c> = impl Future<Output = ()>;
fn foo<'a, 'b, 'c>(it: &'a mut &'b ()) -> FooRet<'c>
where
    'a: 'c,
    'b: 'c,

But that's causing an error:

error[E0700]: hidden type for `impl Trait` captures lifetime that does not appear in bounds
 --> src/main.rs:5:43
  |
5 | fn foo<'a, 'b, 'c>(it: &'a mut &'b ()) -> FooRet<'c>
  |                                           ^^^^^^^^^^
  |
note: hidden type `impl Future` captures the lifetime `'b` as defined on the function body at 5:12
 --> src/main.rs:5:12
  |
5 | fn foo<'a, 'b, 'c>(it: &'a mut &'b ()) -> FooRet<'c>
  |            ^^

Trying to work around that error, I messed up the direction of dependency.

I think. :sweat_smile:

So, the error E0700 is an infamous one that has to deal with the fact that you cannot express the type &'a mut &'b () with the lifetime 'c alone. But do not despair, your are on the right path!

  • Just so that you see you are on the right path, if you replace FooRet<'c> with BoxFuture<'c, ()> (and surrounding the async move in a Box::pin() call), you'll see that your code will compile: that means that you got the + 'region_of_usability lifetime bound right (indeed, for any region 'c contained within both 'a and 'b, and thus, their intersection, the returned object will necessarily be safe to use within it (the region 'c)).

  • It's just on the type alias part that you are getting the error: while your type is necessarily usable within 'c, it carries extra lifetime information (the 'b lifetime) which cannot be dismissed / discarded and which cannot be expressed with 'c alone.

    So you have to keep carrying that "vestigial" 'b parameter in the type alias, since it must be kept around:

    - type FooRet<'c    > = impl Future<Output = ()>;
    + type FooRet<'c, 'b> = impl Future<Output = ()>;
    

See also:

In some real code, I just encountered a case where I have to use dyn with multiple lifetimes (I'm currently experimenting on how to use streams). Actually I was forced to introduce a single lifetime as apparently dyn only allows one lifetime specified. This is what I had to do:

fn transactions_exist<'a: 'ret, 'b: 'ret, 'c: 'ret, 'ret>(
    &'a self,
    ids: impl 'b + Iterator<Item = &'c TransactionId> + Send,
) -> Box<dyn 'ret + Stream<Item = std::io::Result<bool>> + Send>
where
    Self: Sync,
{
    Box::new(
        futures::stream::iter(ids).then(move |id|
            self.transaction_exists(id)
        )
    )
}

So dealing with impl and with dyn seems to work very differently and my "naive" approach to come up with a single new lifetime (which caused E0700 when using impl) works with dyn (and even seems to be required?). That is a bit confusing. If I try dyn + 'a + 'b + 'c + …, then I get:

error[E0226]: only a single explicit lifetime bound is permitted
   --> src/xyz/mod.rs:212:23
    |
212 |     ) -> Box<dyn 'a + 'b + 'c + Stream<Item = IoResult<bool>> + Send>
    |                       ^^

So do I get it right that dyn and impl need to be used in completely different ways when it comes to lifetimes?

Update: Just before falling asleep, I figured I mixed up the direction of the relationship again: dyn 'a + 'b + … would make no sense anyway, as the dyn object doesn't need to live as long as the arguments to my function are living, but the other way around: the arguments must live at least as long as the returned boxed dyn object lives. :crazy_face:

Nonetheless, dyn and trait seem to work differently, because it's okay to come up with a new single lifetime for dyn, while this doesn't work with trait.

There is an inconsistency between dyn and impl, yes (which is something I personally hope gets solved in the language):

  • thanks to type erasure, dyn only cares about the 'region_of_usability,

  • whereas impl does need access to the larger invariant lifetimes nonetheless, since it's not really erasing the type, just hiding it.

Yep, that's the way to express the 'intersection lifetime, as I have been calling it, which indeed represents a region within which the returned item will always be valid to use :+1:

3 Likes

I wonder if in the concrete example, I can use a single lifetime for the two lifetimes 'b and 'c, because the iterator cannot live longer than its items anyway:

fn transactions_exist<'a: 'ret, 'b: 'ret, 'ret>(
    &'a self,
    ids: impl 'b + Iterator<Item = &'b TransactionId> + Send,
) -> Box<dyn 'ret + Stream<Item = std::io::Result<bool>> + Send>
where
    Self: Sync,
{ /* … */ }

The code compiles (so far). But does that restrict the argument type to those iterators that live exactly as long as the items (but not shorter, yet longer than 'ret)? That I wouldn't want. Thus I assume this simplification is a bad idea?

Note that impl in the last two examples is in argument position, so it will behave different once more :rofl:


I.e., writing it without impl, it should be equal to this:

fn transactions_exist<'a, 'b, 'ret, I>(&'a self, ids: I)
    -> Box<dyn 'ret + Stream<Item = std::io::Result<bool>> + Send>
where
    'a: 'ret,
    'b: 'ret,
    I: 'b + Iterator<Item = &'b TransactionId> + Send,
    Self: Sync,
{ /* … */ }

I assume, that would force the iterator type I to live as long as each Item (of the iterator), which is what I don't want, right? Or could the iterator items live longer than 'b here? I'm not sure exactly what the syntax means.


Update: I just figured out, I probably need two different lifetimes for the iterator itself and its items, as the following example shows:

// The following doesn't work:
// fn accept_iter<'a>(iter: impl 'a + Iterator<Item = &'a str>) {
//
// Instead we need:
fn accept_iter<'a, 'b>(iter: impl 'a + Iterator<Item = &'b str>) {
    for s in iter {
        println!("Got: {}", s);
    }
}
fn main() {
    let v = vec!["static1", "static2"];
    let dummy: i32 = 99;
    struct Iter<'dummy> {
        storage: Vec<&'static str>,
        index: usize,
        _dummy: &'dummy i32,
    }
    impl<'a> Iterator for Iter<'a> {
        type Item = &'static str;
        fn next(&mut self) -> Option<Self::Item> {
            if self.index < self.storage.len() {
                let result = self.storage[self.index];
                self.index += 1;
                Some(result)
            } else {
                None
            }
        }
    }
    let iter = Iter {
        storage: v,
        index: 0,
        _dummy: &dummy,
    };
    accept_iter(iter);
    drop(dummy);
}

If I use a single lifetime, then _dummy is expected to be 'static too.

This is very evil, as the following program neither creates a compile- nor a runtime-error:

fn accept_iter<'a>(iter: impl 'a + Iterator<Item = &'a str>) {
    for s in iter {
        println!("Got: {}", s);
    }
}
fn main() {
    let v = vec!["static1", "static2"];
    accept_iter(v.into_iter());
}

Thus, I might believe my implementation of accept_iter with only one lifetime 'a is sound, which it is not.

Is this common pitfall regarding lifetimes described somewhere? I think I stumbled upon it multiple times.

The official documentation never delves too far w.r.t. lifetime (anti-)patterns and idioms, so it's rather the occasional blog post which may be more useful in that regard, at the cost of discoverability of the very post.

Related to this instance, rather than a pitfall, there is a rule of thumb which is that if you can use distinct lifetime parameters, then go for it / you should use as many distinct lifetime parameters as possible: the very mechanism of lifetimes is that when they appear repeated in several places of a signature they impose equality constraints.

  • A corollary of that rule of thumb is that lifetime elision thus very often does the right thing, and thus that lifetimes should rarely be named (although the corollary can't be applied to the situation in this thread, where we do need to name all the input lifetimes so that they all appear in the Future-existential return type).

  • A tangential version of this rule of thumb is that explicit -bounds on lifetimes (e.g., 'a : 'b) ought to be avoided, since they are almost always implicitly generated, and, again, if they are misused they may over-constrain the signature (c.f. the signature you had which reversed the bound and thus implicitly featured lifetime equality).

And, as your &'static Thing-yielding Iter<'lt> example showcases, there is a no reason that a borrowing entity such as Iter<'lt> should be yielding items that are bound with the very same lifetime.

In this area, especially regarding your code that compiles, you might have been surprised by a non-borrowing entity such as a vec! meeting the impl 'static bounds. But this is actually correct, since impl 'region_of_usability (or T : 'region_of_usability) is the way to express that, when you own an entity of that type, you can / are allowed to use it within that whole 'region_of_usability, even though, in practice, you rarely go that far. And, indeed, your Vec<&'static str>, much like a Vec<u8>, or a String or Vec<String>, they can all be owned indefinitely long, i.e., they can be owned within the never-ending 'static region, and so they are all very much 'static. This is different than, say, a Vec<&'short u8>, which may dangle / contain dangling references beyond that 'short region, and thus all we have is that it is 'short (more generally, it is 'c for any 'c where 'short ⊇ 'c).

So, back to the "where can I read about these things" question, you may be interested by the following blog post (if you haven't seen it already):

3 Likes

I did read through it some time before, but I didn't remember having seen this advice before:

It seems to make sense though, because:

That is very helpful to know / keep in mind.

I always thought of &'x T meaning that the reference lives at least as long as 'x. But apparently that is wrong, and the reference lives exactly as long as 'x.

Lifetimes seem to be the issue most people stumble upon (even when you think you have understood them), so maybe it would be wise to extend the documentation in that regard and to emphasize that in many cases you need distinct lifetime parameters. (I also remember a thread in this forum where someone was confused about why he/she can't use a single lifetime parameter.) But perhaps I was also not focussed enough when learning the language and missed some crucial parts.

Well, perhaps most in this direction, and also directly related to your observation

is the

5) if it compiles then my lifetime annotations are correct

which actually features two examples where a problem is created by two lifetimes in a signature being the same (without any need for that in the implementation)


This interpretation of yours may have arisen because &'a T is covariant in 'a. Thus, even if you have two references &'a Foo and &'b Bar with two different lifetimes, you can still call a function fn baz<'c>(f: &'c Foo, b: &'c Bar) with them because &'a Foo can coerce to &'c Foo and 'b Bar can coerce to &'c Bar (as long ass 'a: 'c and 'b: 'c).

1 Like

From my short time experimenting with Haskell, I often had the effect that "if it compiles, it is correct" (in Haskell!). Of course, that is an exaggeration.

Maybe Rust is different in that matter. I have meanwhile stumbled upon a lot of cases where code compiled, but was, in fact, very wrong. Rust (without unsafe) guarantees memory safety (if the code compiles), but it doesn't mean that the compiler helps you specifically to get your code right.

Don't get me wrong, I still love Rust's safety guarantees, because memory errors are most difficult to debug (and if I do not use unsafe, then I should not run into any of these). But I do have to keep in mind that compile-time checks are indeed limited and won't fix all my problems and mistakes.

Exactly (without having known the term "covariance" in that context at that time). I made that (wrong) interpretation due to observing Rust's behavior when calling functions.

AFAIK, Rust is also often described this way. This kind of statement is of course limited. It’s more of a “if you know what you want, you can write it down, tweak it until it compiles and then it usually works”. With lifetimes, people often don’t know what they want at all, because they don’t understand lifetimes. When the approach is, try out random ways of assigning lifetimes until you find the first one that happens to compile, then it’s unlikely that the signature you found actually is the “correct” one. (This is especially true if you didn’t even write any uses of your function yet, how is the compiler supposed to be able to check that your function signature fits both the implementation of the function and the ways you’d like to use it, if the ways you’d like to use it aren’t even in your code yet?)

Maybe you could also view this as a rule of thumb: If you give incorrect type/function signatures, it’s hard to end up with the right thing. The compiler usually prefers to suggest you changing your implementation to fit the signature than to change the signature to fit the implementation. This applies to Haskell, too; but Haskell has the advantage that type signatures are optional – the language is better suited for type inference and the compiler can usually tell you the correct most general type signature to put on your function after you wrote it. Rust can’t do that; on one hand due to some language mechanisms like method resolution necessarily requiring types to already be known; on the other hand because the compiler currently won’t really accept you leaving out function signatures, even temporarily.

I imagine in the future, we could write a function without any lifetime annotations first (which would still result in a little unobtrusive error informing us that we’ll have to put them in eventually), but you can keep coding until all other errors/warning are gone, and then ask the compiler to fill in the lifetime details in function signatures for you in the most general manner possible, taking the function implementation into account.

1 Like

Maybe it's more: "In Rust you make more run-time and more compile-time errors, but the number of compile-time errors is that much bigger that it looks like the compiler catches all errors." :joy:

Not clear what the “more” refers to. Compare Rust with any language that has a weaker type system, e.g. Java, or Python, and there’s clearly way fewer run-time errors you’ll encounter when programming in Rust. Even compared to Haskell, Rust’s standard library prefers to avoid panicking/fallible functions more; in Haskell many operations will throw an exception when Rust would use Result or Option (typical example: head and tail for lists). This can result in fewer run-time errors because you’ll never forget a function can fail. (Another thing: Rust checks patter matches to be exhaustive by default, while Haskell introduces a catch-all case throwing an exception.)

Don't worry, I made a joke. Seriously: I feel like I make less errors in Rust when comparing it to other languages I have been working in (except Haskell, which may be a special case anyway).

(But note that Haskell can't be as efficient as Rust, I think, so it's not really good to compare these here.)


And ever since I started working with Rust, I don't want to go back to any other language with a weaker type system (with the exception of untyped languages, which I find beautiful in their own way).


With my joke I wanted to express that just because a certain language throws a lot of compiler errors to you as a newcomer, that doesn't necessarily mean that language keeps you safe from errors that unfold at runtime.

Anyway, to get back to the original topic of this thread (async trait methods):

I was happy to read the Lang team October update from October 8th, 2021:

  • Async fundamentals update:
    • What is it? Async fn in traits, async drop, async closures
    • Have designated an MVP version of async functions in traits that we intend to stabilize first, and done a lot of exploration on next steps (read up on that in the ever evolving evaluation doc).

So maybe my worries will be solved soon (at least in some regards, as traits with async fn might not be object safe at first).

Really looking forward to it! Rust is awesome!! :smiley:

Safe Rust can't protect you from all errors in your program logic. What it does do is protect you from difficut-to-reason-about classes of temporal-sequencing and memory-use errors that historically have led to malware-exploitable vulnerabilities.

I think this always depends on what you’re comparing agains. Compared to C++, Rust does protect you against UB / memory unsafety; compared to memory-safe languages, that’s not much of a difference. Rust does offer full protection against memory unsafety and data races; beyond that it has a strong type system which doesn’t fully protect against logic errors, yet still catches many, many, many errors at compile time that would become run-time exceptions or misbehavior on other languages, particularly when you compare to dynamically typed languages, e.g. Python or Javascript.

Rust also offers more protection against errors due to non-thread-safe code. Race conditions are still possible in Rust, but only when you’re e.g. explicitly using atomics, there’s no risk of running into errors due to concurrent uage of code that wasn’t meant to be used from multiple threads.

My main point is; unless you’re coming from an unsafe language like C++, memory safety isn’t really anything new; observations like “if it compiles, it runs correctly” are describing what strong typing does to your coding experiance, and they are particulatly valid when comparing to more weakly typed but memory-safe languages, the observation has little to nothing to do with Rust’s memory safety guarantees (though of course the tools that enable memory safety in Rust are also tools that can also help catch other logic bugs).

2 Likes

I just re-read that MVP:

  • No support for dyn
  • […]
  • No ability to bound the resulting futures (e.g., to require that they are Send )
    • […] the [only] limitation is that one cannot write generic code that invokes spawn.
    • Workaround: do the desugaring manually when required, which would give a name for the relevant future.

I personally don't mind the missing dyn support, but if the trait method returns a ?Send future, then this might be a big implication that would lead to "manual desugaring" again.

And that does get pretty ugly, as I pointed out here:

Also things get even more verbose when there are default implementations for trait methods, as I said here:

That is, I have to write:

#![feature(type_alias_impl_trait)]
#![feature(associated_type_defaults)]

use std::future::Future;

type DefaultFoo<T: ?Sized> = impl Future<Output = ()>;
trait Bar {
    type Foo: Future<Output = ()> + Send = DefaultFoo<Self>;
    fn foo(&self) -> DefaultFoo<Self> {
        async move {
            ()
        }
    }
}

(And that example doesn't even deal with extra lifetimes yet.)

So considering that I need my futures to be Send (because I want to write generic code, even if I can live without dyn), the currently planned MVP might not help me much, because I'd end up with the same ugly code.

Nonetheless, I appreciate the progress in that matter. Afterall, it's a first step.

As you pointed out, it is possible to avoid heap allocations in async trait methods (using unstable features). However, it pretty much bloats up the syntax, as I said more up in this thread.

Nonetheless, I decided to go that way and to use nameable existential (associated) types. If you're interested, see this post for a use case where each async trait method shouldn't do any unnecessary heap allocations (as I might have a lot of these calls), and how I applied the method.

Thanks again for bringing that up in the first place! :+1: It has been pretty helpful and seems to be usable in practice (if using unstable features is an option).

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.