Lifetime and iter

I have the following minimized code that produces a lifetime compiler error. It fails only when iter() is involved.

Playground

use std::collections::HashMap;

struct Person<'a> {
    name: &'a str,
}

struct PersonInfo<'a> {
    info: &'a str,
}

struct Model<'a> {
    persons: Vec<Person<'a>>,
    person_info_map: HashMap<String, PersonInfo<'a>>,
}

impl<'a> Model<'a> {
    fn build(&'a mut self) -> Vec<&'a PersonInfo<'a>> {
        let person_info_map = &mut self.person_info_map;
        self.persons.iter().map(|person| {
            update_info(person, person_info_map)
        }).collect()
    }
}

fn update_info<'a>(
    person: &'a Person,
    info_map: &'a mut HashMap<String, PersonInfo<'a>>,
) ->  &'a PersonInfo<'a> {
    let name = person.name.to_string();
    let entry = info_map.entry(name);
    entry.or_insert_with(|| { PersonInfo { info: person.name }})
}

The error seems to be centered around:

note: first, the lifetime cannot outlive the lifetime `'_` as defined on the body at 22:33...
  --> src/lib.rs:22:33
   |
22 |         self.persons.iter().map(|person| {
   |                                 ^^^^^^^^

But I think the references returned by iter() should be able to live for the same lifetime as the collection being iterated over (which has the lifetime of 'a). Why does person have '_ lifetime and not 'a.

If I add person: Person<'a>, to the Model struct, then update_info(&self.person, person_info_map) (added to build) compiles just fine.

Any tips on fixing this?

When you pass an exclusive reference (&mut) into a function and get a reference back out, the exclusive borrow lasts for the lifetime of the returned reference, even if the returned reference is not &mut. Here's a version that illustrates how the first function call impedes any further function calls within the lifetime of the returned value, due to the exclusive borrow.

But beyond this, you're trying to build a Vec of references to the members of self.person_info_map while simultaneously modifying self.person_info_map. This isn't going to work -- what if the HashMap runs out of memory and has to reallocate halfway through, moving its contents in the process? Your vector of references would dangle.

Instead you can apply all your updates, and then gather your vector of references afterwards. Here's a version which satisfies the borrow checker.

There were also some issues with using the same lifetime in too many places. If you use 'a three times in the same function signature, that requires all three lifetimes to be exactly the same. In build, for example, you should just elide the lifetime of the outermost references, and the compiler will do the right thing. The elision rules don't apply to update_info, but you can still divorce the lifetimes. (I'm not sure the function is still useful after the rest of the modifications, but left it in anyway.)

5 Likes

If you have a type T that already contains an &'a U reference, and you want to give away further borrows pointing into that inner reference, then there's no need to also require &' a self in the implementation of the outer type T. It's useless and most often outright wrong, because the outer container necessarily lives shorter than the inner value.

You can just leave off the lifetime annotations entirely instead, except that you have to specify that the return type is &'a U (or some projection thereof), and your code will compile — provided the implementation of the method is consistent with returning a reference living at least for 'a, which it is when you simply reborrow or copy the inner reference.

Thank you so much for a detailed explanation and solution.

In retrospect, not being able to use iter() in combination with an exclusive borrow was obvious (your example made it clear). A lesson learned!

I will try applying the suggested solution to the real problem I am dealing with, which involves a few more layers (has a tree-like structure whose "Info" needs to be collected).

@quinedot When I tried to apply the solution you suggested to my actual code, I am running into other problems (I had over-minimized the code sample). So I am trying to understand how to think about lifetime in this context. I am especially interested in your thinking process for adding 'b (and its relationship with 'a).

So I tried to use more meaningful names to lifetimes. Here is your working solution, except it replaces 'a with 'model (I could have put more specific lifetime names for Person etc). In my mind, the content of Model live for the 'model lifetime and as well as each field's contents (such as an individual Person element). What would be a semantically meaningful name for 'b?

use std::collections::HashMap;

struct Person<'model> {
    name: &'model str,
}

struct PersonInfo<'model> {
    info: &'model str,
}

struct Model<'model> {
    persons: Vec<Person<'model>>,
    person_info_map: HashMap<String, PersonInfo<'model>>,
}

impl<'model> Model<'model> {
    fn build(&mut self) -> Vec<&PersonInfo<'model>> {
        for person in &self.persons {
            update_info(person, &mut self.person_info_map);
        }

        let person_info_map = &self.person_info_map;
        self.persons
            .iter()
            .flat_map(move |person| person_info_map.get(person.name))
            .collect()
    }
}

fn update_info<'model: 'b, 'b>(
    person: &'b Person<'model>,
    info_map: &'b mut HashMap<String, PersonInfo<'model>>,
) ->  &'b PersonInfo<'model> {
    let name = person.name.to_string();
    let entry = info_map.entry(name);
    entry.or_insert_with(|| { PersonInfo { info: person.name }})
}

There was no need for the borrows from self to be tied to the lifetime of the references inside the Model. When they were all 'a in the update_info declaration, that means they all had to be the same. This is less flexible and can often lead to situations where the lifetime constraints cannot be met. You almost always want to allow the outside borrows be shorter than the inside lifetime, so that's why I added the 'b.

The same reasoning is why I removed some of the lifetimes on the update() declaration. I could have given new named lifetimes ('b, 'c) to each elided reference instead, and the effect would have been the same. On update_info, there is more than one "input lifetime" (the &Person and the &mut HashMap), so you have to use names in order to tell the compiler which you intend to match the output lifetime (the return value lifetime).

More details on lifetime elision can be found here.

I was in fact a bit sloppy/lazy by putting 'b on both input lifetimes in update_info, a better version is like so (eliding the &Person lifetime):

// I just chose `'hm_borrow` as it's a borrow of the `HashMap`
fn update_info<'model: 'hm_borrow, 'hm_borrow>(
    person: &Person<'model>,
    info_map: &'hm_borrow mut HashMap<String, PersonInfo<'model>>,
) ->  &'hm_borrow PersonInfo<'model> {
    let name = person.name.to_string();
    let entry = info_map.entry(name);
    entry.or_insert_with(|| { PersonInfo { info: person.name }})
}

This article on lifetime misconceptions may help guide you on how to think about these problems. #4 and #5 are pretty relevant to this thread, although the example in #5 is a situation where elision was the wrong choice, while our conversation has been more about a case where elision is the right choice.

1 Like