Weird borrow checker error: when 'XXX' is dropped and runs the `Drop` code for type `Vec`

velvia · July 9, 2021, 6:53pm

Hi, I'm not new to Rust but have been stumped for a day by a weird borrow checker error. Unfortunately the code is quite complex and involves async and the Geos crate, so I can't test it in playground, but I'll try to simplify as much as possible. Just trying to get insights on the error and where to go.

The code, a bit simplified:

use geos::{Geometry as GGeometry, PreparedGeometry, STRtree, SpatialIndex};

#[derive(Clone)]
pub struct GeomAndId<'a> {
    pub geom: GGeometry<'a>,
    pub id: u128,
}

pub struct GeomIndex<'a> {
    rtree: STRtree<'a, u128>, // Find approximate ID for point
    geoms: HashMap<u128, PreparedGeometry<'a>>, // For finding precise contains/within
}

impl<'a> GeomIndex<'a> {
    /// Tries to populate a new STRtree from a bunch of GeomAndId's
    pub fn try_populate(
        geoms_and_ids: &'a [GeomAndId<'a>],
    ) -> Result<GeomIndex<'a>, MyError> {
        let mut rtree = STRtree::with_capacity(geoms_and_ids.len())?;
        let mut geoms = HashMap::new();
        for g in geoms_and_ids {
            rtree.insert(&g.geom, g.id);
            geoms.insert(g.id, g.geom.to_prepared_geom()?); // Prepared geometries are faster for tests
        }
        Ok(GeomIndex { rtree, geoms })
    }
}

async fn foo(...) -> Result<usize, MyError> {
    let origin_geoms: Vec<GeomAndId<'a>> = preprocess_custom_geometries(....)?;   // Derives geometries from other inputs
    let origin_index = GeomIndex::try_populate(&origin_geoms[..])?;  // origin_index borrows from origin_geoms
    /// irrelevant code
    Ok(call_async_aggregation_func(..., origin_index).await?)
}

origin_geoms is only referenced in one place - the line to call GeomIndex::try_populate(), which must borrow it because it creates PreparedGeometry, which refers to the original geometry passed in try_populate().

The error:

error[E0597]: `origin_geoms` does not live long enough
    --> src/.../mod.rs:1142:49
     |
1142 |     let origin_index = GeomIndex::try_populate(&origin_geoms[..])?;  // origin_index borrows from origin_geoms
     |                                                 ^^^^^^^^^^^^ borrowed value does not live long enough
...
1205 | }
     | -
     | |
     | `origin_geoms` dropped here while still borrowed
     | borrow might be used here, when `origin_geoms` is dropped and runs the `Drop` code for type `Vec`

I'm not understanding the specific error message. The "dropped here while still borrowed" occurs at the very end of the async fn foo. origin_geoms is only borrowed in one place. It is true that origin_index refers to origin_geom's geometries, but once the call_async_aggregation_func async code returns, it is all done and both origin_index and origin_geoms are not needed anymore. There is no leakage of references to the result, which is a basic aggregation count and does not include any geometries at all.

The error seems to imply that the Drop implementation for Vec<Geoms_And_Ids> somehow must borrow the value again, which I don't understand. Why would the drop borrow the value again?

Many thanks.

Yandros · July 9, 2021, 7:41pm

Minimal repro

type Unshrinkable<'lt> =
    ::core::marker::PhantomData<fn(&()) -> &mut &'lt ()>
;

struct Geometry<'lt> (
    Unshrinkable<'lt>,
);
impl Drop for Geometry<'_> { fn drop(&mut self) {} }

struct S<'lt> (
    &'lt (), // whatever
);

impl<'lt> S<'lt> {
    /// Key problematic signature
    fn new (_: &'lt Geometry<'lt>)
      -> S<'lt>
    {
        S(&()) // whatever
    }
}

fn main ()
{
    let geo: Geometry<'_> = Geometry(Default::default());
    let _ = S::new(&geo);
}

EDIT: in case that wasn't clear: S<'_> plays no role whatsoever, the issue is in some function signature requiring it be given a &'lt Geometry<'lt>, which is an impossible to fabricate type.
Playground

Quick explanation

This is rarely what you want: nested borrows should not be using the same lifetime, as showcased by the previous minimal repro: by writing that the outer borrow should live as long as the inner one, when this inner borrow cannot shrink (called "invariant lifetime"), it means that the outer borrow is necessarily borrowing the borrowee for its whole lifetime, spanning even beyond the point where the borrowee is being dropped.

You can see how that is problematic, when the drop point requires a &mut Self access, i.e., unique unaliased access to Self. This conflicts with that other borrow that tries to span over too large of a lifetime.

Solution

The system is currently overconstrained, w.r.t. lifetimes, because it is requiring that too many lifetimes be equal to one another.

The solution is thus to loosen such constraints, by allowing those lifetimes to be distinct. This is achieved by giving each one a name. I'd thus start off doing the following changes:

    pub fn try_populate(
-       geoms_and_ids: &'a [GeomAndId<'a>],
+       geoms_and_ids: &'geoms_and_ids [GeomAndId<'geom>],

thus:

- impl<'a> GeomIndex<'a> {
+ impl<'geoms_and_ids, 'geom> GeomIndex<'geoms_and_ids, 'geom>
+ where
+     'geom /* ≥ */ : 'geoms_and_ids, // 'geom ⊇ 'geoms_and_ids
+ {

thus:

- pub struct GeomIndex<'a> {
-     rtree: STRtree<'a, u128>, // Find approximate ID for point
-     geoms: HashMap<u128, PreparedGeometry<'a>>, // For finding precise contains/within
- }
+ pub struct GeomIndex<'geoms_and_ids, 'geom> {
+     rtree: STRtree<'geom, u128>, // Find approximate ID for point
+     geoms: HashMap<u128, PreparedGeometry<'geoms_and_ids>>, // For finding precise contains/within
+ }

Let's see if by propagating that across the code base the error is fixed

velvia · July 9, 2021, 8:33pm

Thank you @Yandros ... that's a great explanation and it makes sense now. Using the two lifetimes does seem to solve that immediate issue, though now the problem is shifted. I have an async function that uses this GeomIndex, which returns something not dependent on the input lifetime.

async fn aggregate(...., index: GeomIndex<'geoms_and_ids, 'geom>) -> Result<usize, ....> {
    ...
}

The problem now is that I get error: this parameter and the returned future are declared with different lifetimes... Rust must think the Future returned by the async fn must somehow take this input Geomindex lifetime into account, and I want to convince the compiler that my output has an independent lifetime (because it does, it is an aggregation that does not borrow the contents of GeomIndex).... how do I do that? Many thanks

alilleybrinker · July 9, 2021, 9:43pm

If you don’t specify lifetimes, one of the inference rules says that when there’s one input reference that the lifetime of that reference will be assigned to any borrowed data in the output type. So perhaps the returned borrow is being assigned that lifetime, “tying” it as you say you don’t want. Specifying the lifetime in the return type with its own name might help, though it raises the question (which is hard to answer without more code) of what the borrowed returned data is borrowed from.

Yandros · July 12, 2021, 1:05pm

Sadly, async fn sugar indeed conservatively assumes that the input will be borrowed from, since the async { … } generator / state machine "starts immediately" / the whole body of the function is lazily evaluated, meaning the borrows need to be upheld until the future is .awaited:

async
fn example<'thing> (thing: &'thing Thing)
  -> Ret
{
    todo!("FUNCTION BODY…");
}

becomes:

fn example<'thing, 'returned_future> (thing: &'thing Thing)
  -> impl 'returned_future + Future<Output = Ret>
where
    &'thing Thing : 'returned_future,
               //   +-<----------------<---------------<------+
               //   |                                         |
{              // vvvvv                                       ^
    async move /* thing */ {                               // |
        // make it so each and every parameter is captured    |
        let _ = &thing; // ->---------->----------------->----+ 
        todo!("FUNCTION BODY…");
    }
}

FWIW: you can automagically generate this unsugaring by using the showme feature of my ::fix-hidden-lifetime-bug crate.

The key thing (heh) here is that the span during which the returned future is usable, that is, 'returned_fut, must satisfy that each and every capture is, in and of itself, usable during that span / region. Thus, if the returned future / async … generator captures thing: &'thing Thing, then we end up with that &'thing Thing : 'returned_future constraint (and so on for the other function params).

That constraint becomes:

Thing : 'returned_fut, i.e., the referee / borrowee must, itself, be guaranteed not to dangle during 'returned_fut;
and 'thing : 'returned_fut, i.e., 'thing ≥ 'returned_fut (or more precisely, 'thing ⊇ 'returned_fut: the span of usabily of the returned future must be contained within / cannot escape the span of the borrow of *thing, lest that borrow dangles).

By the way, in order for &'thing Thing to be well-formed, we also have an automatic Thing : 'thing constraint. Thus, since 'thing : 'returned_fut, we realize tht the first Thing : 'returned_fut constraint is redundant / superfluous)

All that to say that we end up with the returned future not being able to outlive 'thing, and so on for each lifetime appearing among the function parameters' types.

Again, this makes sense since the body of that async fn, i.e., the body of the async generator / state machine is only reached when the future is polled, which can be arbitrarily late among the future's lifetime:

let fut;
{
    let thing: Thing = …;
    fut = example(&thing);
}
… fut.await; // <- body of `async fn example` is only reached at this point

So, if thing is (captured since potentially) used in that async fn body, Rust ought to cause a compilation error to prevent that use-after-free bug from making it to the runtime.

Hence your error.

You have two solutions –which are somewhat the same when we think about it–.

Do not use thing inside the future's block, and make sure Rust knows that you did it on purpose: as I mentioned, Rust assumes that any parameter to an async fn may be mentioned by the function's body, and thus makes each be captured by the returned future. This is so that if the first version of your not-totally implemented API forgot to use thing in its function body, and if a downstream user of your library used that function, then a later change in the implementation only does not cause a change in the function's signature, not even in its lifetimes, maintaining a robust abstraction boundary that avoidd breakage on such downstream code.

So in order to change that, you need to manually perform that equivalent unsugaring I showcased above:

fn example<'thing, 'returned_future> (thing: &'thing Thing)
  -> impl 'returned_future + Future<Output = Ret>
where
    &'thing Thing : 'returned_future,
               //   +-<----------------<---------------<------+
               //   |                                         |
{              // vvvvv                                       ^
    async move /* thing */ {                               // |
        // make it so each and every parameter is captured    |
        let _ = &thing; // ->---------->----------------->----+ 
        todo!("FUNCTION BODY…");
    }
}

and remove any mentions to thing inside the async move { … } block:

- async move /* thing */ {
+ async move /* no-thing */ {
-     // make it so each and every parameter is captured
-     let _ = &thing;
      todo!("FUNCTION BODY…");
  }

(assuming FUNCTION BODY… does not mention thing either).

with that, you can finally remove the bounds regarding the thing: &'thing Thing capture:

  fn example<'thing, 'returned_future> (thing: &'thing Thing)
    -> impl 'returned_future + Future<Output = Ret>
  where
-     &'thing Thing : 'returned_future,
  {
      async move /* no-thing */ {
          todo!("FUNCTION BODY…");
      }
  }

which can finally be cleaned up to:

//                  don't care, thus don't even name it
//                  vv
fn example (thing: &'_ Thing)
  -> impl 'static + Future<Output = Ret>
//        ^^^^^^^
//        the unconstrained / universal / never-ending span
{
    async move { … }
}

But not using thing at all in the body of example is a bit restrictive; the key thing to do was not to use the borrow of *thing inside the async move { … } block.

This means that if we can derive some data off *thing that, itself, does not borrow *thing and is thus no longer lifetime-bound, then:

We can hoist and perform that operation before and thus outside that async move { … } block.

For instance, assuming Thing has a Copy field, such as age: u8, we can do:
```
fn example (thing: &'_ Thing)
  -> impl 'static + Future<Output = Ret>
{
    let age = thing.age; /* dereference the borrow *before* returning the future */
    return async move /* age */ {
        todo!("REST OF THE BODY, using age = {}", age);
    };
}
```
This is the key thing to realize: the real body of example, usually, is that single return-the-compiler-generated-state-machine statement. But we can do some (non-blocking!) pre-processing before that, and only then move / transfer ownership of the newly computed stuff by that preprocessing step into the to-be-returned future / generator block.

But what if the field is not Copy? Well, the same works with Clone types, such as a hypothetical name: String field:
```
let name: String = thing.name.clone();
return async move /* name, … */ {
    /* stuff with `name`… */
};
```
And if Clone is deemed too expensive, that's when a good old layer of Arc reference-counting comes in handy:
```
struct Thing {
-   long_name: String,
+   long_name: Arc<String>, // or, here, `Arc<str>` to optimize
}
```
and then:
```
let name = Arc::clone(&thing.name);
async move { … }
```
This usually leads to the pattern whereby Thing is just a wrapper around a Arc<ThingFields>, and then the function body becomes:
```
#[derive(/* Arc */ Clone)]
struct Thing(Arc<ThingData>);

struct ThingData { long_name: String, … }

fn example (thing: &'_ Thing)
  -> impl 'static + Future…
{
    let thing: Thing = thing.clone();
    async move { … }
}
```

velvia · July 13, 2021, 1:15am

@Yandros thanks so much again for an extremely detailed, really good explanation. It makes sense I suppose that the compiler has to make sure that items returned that may depend on input does not outlive it, though it's conservative (and maybe unnecessarily conservative in my case).

For a bit more detail (because there isn't an async move explicitly needed here), the processing is something like:

async fn example (thing: Thing<'a>)
  -> Ret
{
    let results = Vec::new();
    while let Some(item) = future_stream.next().await {
        results.push(do_something_with(&thing));
    }
    Ok(results)
}

There is an error basically that thing is held across on await point at do_something. Which is true I suppose, but in this case is it enough to use Arc/clone for thing before the while let loop? It's not clear where I'd need to break up thing so that it's not waited across await points.

Note that in my case I'm not actually passing a ref/pointer to Thing, rather it is that I am passing in Thing directly (it is moved into the example fn), however Thing has a lifetime of 'a let's say. Does that still constrain the output Ret to have a lifetime implicitly such that 'a: 'ret ?

Thanks so much again

system · October 11, 2021, 1:16am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Struggling with borrow checker - mutable borrow of Vec help	5	1247	January 12, 2023
Beginner question: borrow checker issue help	4	475	August 4, 2019
Should be trivial, but can't see how to solve this borrow checker error help	4	337	November 7, 2020
Borrow check trouble due to return from for loop help	4	452	July 11, 2021
Fighting with borrow checker help	9	633	October 30, 2020

Weird borrow checker error: when 'XXX' is dropped and runs the `Drop` code for type `Vec`

Minimal repro

Quick explanation

Solution

Related Topics