Using iterators with callback APIs

Some Rust APIs like to take a closure and give you callbacks instead of returning an iterator type. For example, in git_repository, the for_each_to_obtain_tree method calls into your closure repeatedly, each time with a different Change object. However, I'd like to process this stream of Change objects with normal Iterator methods like filter, map, etc. Is this possible? In this case the object is passed in by move, so maybe I could move it back out of the closure, or clone it out of the closure, but Change is generic on a couple of lifetimes aren't specified by the arguments I pass to for_each_to_obtain_tree ('new comes from other, but 'a and 'old AFAICT are decided by the internals of for_each_to_obtain_tree... although I'm not sure how that works because they are in the signature of for_each_to_obtain_tree which seems circular...)

Looking at the function signature:

pub fn for_each_to_obtain_tree<'new, E>(
    &mut self,
    other: &Tree<'new>,
    for_each: impl FnMut(Change<'_, 'old, 'new>) -> Result<Action, E>
) -> Result<(), Error> where
    E: Error + Sync + Send + 'static,

'_ in FnMut(...) is shorthand for a higher-ranked trait bound: It represents a requirement that the argument cannot escape the closure body. In this case, it's because Change.location points to a buffer that for_each_to_obtain_tree() is altering between calls to the closure.

2 Likes

Are you saying this is is some sort of convention, or does '_ actually have special different meaning inside FnMut than it normally does? I thought '_ was always just a standin for a regular lifetime.

So tying into our other discussion this could in theory be changed to an API based on LendingIterator? But also curious if there's anything I can do without changing the original crate, to be able to use Iterator methods in this situation?

As a general rule, iterator APIs are favored in Rust, and so if a function takes a callback rather than returning an Iterator, there will be some reason why it can't return an iterator, which will also be a reason why it can't be adapted into an iterator.

1 Like

As I understand it, the Fn trait syntax does treat '_ somewhat specially.

Specifically, given something like Fn(&u32) -> &u32, that trait is Fn(&'_ u32) -> &'_ u32 is for<'a> Fn(&'a u32) -> &'a u32. The '_ within a Fn trait signature is the elided lifetime of that Fn, not of the surrounding signature.

In order to adapt "internal iteration" (for_each(impl FnMut)) to "external iteration", you need to collect the items into something which can be externally iterated. It looks like while Change is necessarily tied to 'for_each here for the lifetime of the .location: &BStr, you could extract the Event into a vector just fine, and even .to_owned() the location if you need it. Once you have Vec<Event<'old, 'new>>, you can operate on that like normal.

If "tracking" is disabled, it looks like the location is always empty anyway, so whatever information it carries is likely non-critical.

In a perfect world, yes, but sometimes, it'll just be because an internal iterator is easier to implement than a next-based external iterator. There's a reason yield-based generator syntax is a nice usability feature.

2 Likes

To be clear you are saying Fn(&u32) -> &u32 is just syntactic sugar for Fn(&'_ u32) -> &'_ u32 which is just syntactic sugar for for<'a> Fn(&'a u32) -> &'a u32 ?

1 Like

Yes*.

Caveats:

  • I didn't actually test that this was true. It matches my understanding and the usage here, but I've been burnt by skipping checking with the compiler before.
  • Lifetime elision doesn't really work that way, where eliding the lifetime is sugar for '_; '_ is just the explicit form equivalent to eliding the lifetime. '_ is a way to name the elided lifetime, but can't be used to conjure or influence the elided lifetime in any way; it's just a way to refer to the elided lifetime which already exists.

So yes, those forms are all (as far as I'm aware) equivalent.

You can think of the Fn traits as having an implied for<> binder on them as part of their function signature syntax, and '_ as referring to the elided lifetime of whatever the tightest <> bound is, whether that be for a for<> Fn trait bound or a fn f<> definition or an impl<> signature.

Where it becomes fun is when mixing <> binders which don't introduce a fresh unique elided lifetime, at which point I'm not sure exactly when '_ is an error and when it's allowed to refer to an external binder's elided lifetime.

2 Likes

That's not entirely true; trait object lifetimes behave differently when elided than when inferred via an explicit '_.

1 Like

How so?

Those have context sensitivite defaults when (completely) elided. In brief (ie. incompletely), it's almost always inferred inside a function (like '_); elsewhere it's 'static for Box<dyn Tr> and similar, or "same as the outer lifetime" for &dyn Tr and similar.

So for example Box<dyn Tr> is 'static as a fn arg or return instead of a fresh generic (Box<dyn Tr + '_> arg) or the same as some input lifetime via the usual lifetime elision (Box<dyn Tr + '_> return).

I can add more detail when not in mobile if you'd like.

1 Like

I am familiar with &dyn Foo being short for &dyn Foo + 'static, and I think you are saying that in some circumstances &dyn Foo and &dyn Foo + '_ will not behave the same -- but that's about all I understand ;p If you could give a worked example I think that would help. Is it that &dyn Foo strictly assumes 'static but that &dyn Foo + '_ allows other possibilities to be inferred?

Box<dyn Foo> is the one that assumes 'static.

But let's have some examples. Box is the stricter, more-likely-to-be-problematic one, so most examples will be with Box.


Here's perhaps the most common place this distinction pops up, returning a type-erased parameter:

// The example itself is contrived though
fn foo<T: Display + Clone>(t: &T) -> Box<dyn Display> {
    Box::new(t.clone())
}
error[E0310]: the parameter type `T` may not live long enough
 --> src/lib.rs:4:5
  |
4 |     Box::new(t.clone())
  |     ^^^^^^^^^^^^^^^^^^^ ...so that the type `T` will meet its required lifetime bounds
  |
help: consider adding an explicit lifetime bound...
  |
3 | fn foo<T: Display + Clone + 'static>(t: &T) -> Box<dyn Display> {
  |                           +++++++++

What's the problem? The return is a Box<dyn Display + 'static>, but I could pass in a &&'local str for example.

Admission time: taking a Clone and a reference was somewhat artificial, but I've arranged it that way to show an alternative way to fix the compilation error -- use '_:

fn foo<T: Display + Clone>(t: &T) -> Box<dyn Display + '_> {
    Box::new(t.clone())
}

This shows that complete elision and '_ are not the same thing for the dyn lifetime. The '_ here follows the normal lifetime elision rules, and has the same lifetime as the &T reference. (The compiler knows that T is valid for at least that long, because the reference exists -- it would be UB if the T type wasn't valid at least as long as a reference to it. That is, the presence of a &'a T implies T: 'a, which in turn allows the coercion to dyn Display + 'a.)

(Without an input lifetime, you can still avoid 'static, but not by using '_.)


We can turn this around and look at Box in input position too.

// Contrived again
fn foo(d: Box<dyn Display>) {
    println!("{d}");
}

fn main() {
    let local = "Happy new year".to_string();
    foo(Box::new(&*local));
}

foo wants a Box<dyn Display + 'static>, so we can't type-erase the non-'static &'local str and have things work. This time using '_ makes the lifetime into a fresh generic:

fn foo(d: Box<dyn Display + '_>) {
// Same as
fn foo<'a>(d: Box<dyn Display + 'a>) {

So now you might just be like "Box<dyn Foo> => Box<dyn Foo + 'static>, got it!" And I think that's a common approximation for many programmers. However, there is an exception: In expressions (i.e. function bodies), Box<dyn Foo> actually does act like Box<dyn Foo + '_>. For example:

    // This compiles!  But it cannot be a `Box<dyn Display + 'static>`
    let local = "Happy new year".to_string();
    let d: Box<dyn Display> = Box::new(&*local);

This can be confusing when you try to pass your Box<dyn Display /* + '_ */> to something taking a Box<dyn Display /* + 'static */> parameter and get a lifetime error, even though the annotations match syntactically.


Now let's look at &dyn Foo. A &'a dyn Foo is usually a &'a (dyn Foo + 'a). This is a problem less frequently because it's odd to have a constraint where you would need a &'a (dyn Foo + 'static), say, and dyn lifetimes are covariant [1], so if you start from a Box<dyn Foo> it's easy to get a &'local dyn Foo + 'local out of it. So the examples are going to be more contrived.

In fact, I think this is such a non-problem I'm going to hide the discussion of &dyn Trait nuances from the non-curious, because you're probably never going to have to care.

&dyn Trait stuff

Let's just look at some signatures to see how full elision and '_ differ:

fn one_a(_: &dyn Foo)         // fn one_a<'a>(_: &'a (dyn Foo + 'a))
fn one_b(_: &(dyn Foo + '_))  // fn one_b<'a, 'b>(_: &'a (dyn Foo + 'b))

&'a dyn Foo + 'b can coerce to &'a dyn Foo + 'a and you probably never need the 'b to be longer, so although these are different, they're not really meaningfully different.

fn two_a(&self) -> &dyn Foo
fn two_b(&self) -> &(dyn Foo + '_)

Because both the reference and dyn lifetimes are elided in the return position of two_b, they're going to end up with the same concrete lifetime -- so this case is exactly the same as two_a.

fn three_a<'x>(&self, _: &'x str) -> &'x dyn Foo
fn three_b<'x>(&self, _: &'x str) -> &'x (dyn Foo + '_)

These ones are different because three_a returns a &'x (dyn Foo + 'x) but three_b's dyn lifetime is the same as the &self reference. This also implies the lifetime on &self is longer than 'x, which is not implied by the first signature. (The three_b signature doesn't really make sense in this example.)


The approximation "&'x dyn Foo is &'x (dyn Foo + 'x)" works even better than the "Box implies 'static" rule, but there's a similar exception in function bodies... only it's even weirder IMO:

fn foo<'x>() {
    let a: &dyn Foo = todo!();    // &'_ (dyn Foo + '_)
    let b: &'x dyn Foo = todo();  // &'x (dyn Foo + 'x)
}

So the rule is that if the reference lifetime is inferred, the dyn lifetime is inferred indepedently, but if the reference lifetime is explicit, you get the normal "reference lifetime matches dyn lifetime" that happens outside of function bodies. Playground.

...but I've never seen this actually matter or trip someone up.


A summary of where '_ and full elision differ:

Type \ Context: static [G]AT fn input fn output fn body
Box<dyn Tr + '_> 'static E0637 Fresh Param Elision Rules Inferred
&dyn Trait + '_ 'static E0637 Fresh Param Elision Rules Inferred
&'a dyn Trait + '_            'static E0637 Fresh Param Elision Rules Inferred
Type \ Context: static [G]AT fn input fn output fn body
Box<dyn Tr> 'static 'static 'static 'static Inferred
&dyn Trait 'static E0637 Reference (Reference) Inferred
&'a dyn Trait             ('a = 'static) 'a 'a 'a 'a

Since the &dyn Trait full elision is problematic so much less than Box<dyn Trait>, a practical approximation in my opinion is

  • Box<dyn Trait>: Inferred in function bodies, 'static elsewhere
  • &dyn Trait: Inferred in function bodies, same as outer reference elsewhere

And for other types like Arc<dyn Trait> or Ref<'a, dyn Trait>, a good rule of thumb is

  • If it has no lifetime parameter like Arc, it (definitely) acts like Box
  • If it has a generic lifetime like Ref, it probably acts like &
    • It is possible for it to act like Box, but that's probably getting into the weeds again

  1. in surprising ways even, but I won't get into that now ↩︎

4 Likes

Thanks for the thorough answer! I think I mostly can parse this but it's unclear to me what the static column represents? The others make sense to me. Is it how it will behave if used to annotate a static variable declaration?

Right, you can have a static: &str (or const) for example and a 'static lifetime will be used.

    fn foo<'x>(&'x self) -> (&'static str, &'static str) {
        static S: &str = "";
        const C: &str = "";
        (S, C)
    }
1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.