Beginner: trying to understand lifetimes

I often see people using "scope" and "lifetime" to explain references and borrowing. The thing that makes it difficult for me is that most of the times these explanations use the two interchangeably, which confuses me. I'd like to hear a clear distinction between the two, if they are different. Thanks

There is one more doubt of mine on the same topic. What does the compiler do with all these generic lifetime parameters that we have defined? I understand that these lifetime parameters are used by the compiler to check for the relationship between the lifetimes of various references to ensure no data races are there at the runtime. But what exactly does the compiler pass to these lifetime parameters ( if not, why are they called parameters)? And how does it compare the passed "lifetimes" to determine if the program is free of data races or not?
Here I just want to get a better mental model for the lifetimes that get passed into the generic lifetime params

And is there a need for explicit lifetime parameters in functions where there are only local references, i.e, no references as arguments or return values?

6 Likes

Good questioms,

Lifetimes are just scopes, now they are not lexical scopes, but invisible scopes defined by when references are used.

It uses them to make constraints on the inputs and outputs of functions. These constraints limit what values you can send to functions and how you can use the outputs of functions.

The compiler uses a constraints solver to verify that all of the lifetime constraints are valid.

It uses the check from before, and the Send and Sync traits to verify that you are using your types in a threadsafe way. The lifetimes check that any references will be valid. Send and Sync check that the type cam be sent or shared across thread boundaries.
Together the make sure that you are accessing data in a threadsafe way, which is what prevents data races.

Don't explicitly annotate lifetimes until Rust complains about it. Then think about ot carefully to make sure you understand why Rust is complaining and annotate lifetimes. If you don't understand, just come here amd someone will jelp you!

Lifetimes are just like type parameters, except that they are erased instead of monomorphized. They represent constraints that the inputs and outputs to functions or the fields of types must adhere to.

6 Likes

One thing that I want to get cleared right away, when we provide the lifetime params we're just giving the compiler some sort of "rules" regarding the relationship between the references and it has to check during the compiler time whether or not these rules hold. Sometimes the compiler can understand by itself ( implicitly ) what the relationship between the references should be and checks for these relationships and when it can't do the same it relies on us to give the valid relationship. Is this right?

Can I consider these "invisible scopes" to be a set of lines where the reference is actually used or referred to in the code ( and also all the lines in between the first and last of the lines in the set)?

Correct

Yes

struct Something {
    a: String,
    b: String,
}

impl Something {
    fn c(&self) -> &str {
          self.a.as_str()  
    }
}

fn main() {
    let d = Something{
        a:String::from("Hello"),
        b:String::from("Bye"),
    };
    let e = d.c();
    let f = d.c();
    println!("{}", e);
    println!("{}", f);
    println!("{}, {}", e, f);
}

I understood that the lifetimes are passed to the generic lifetime parameters by the compiler. But I am not able to understand what might be those lifetimes that are passed. I think it'd be better to understand the concept by taking a piece of code and running through it like a compiler and figuring out the lifetimes of the various references in the code. But I am not able to figure out the lifetimes of all the references. So please can anyone help me do this considering the above code. The code may not even compile (I just wrote it). But if that's the case; then even better, I can learn why lifetime errors occur and how to correct them :slight_smile:

And also can anyone please let me know what kinds of checks are performed on the various lifetimes (that are already determined by the compiler with or without our help) to say that the code doesn't have any problems (data races), again taking the above example play code.
Thanks

Good news: your code compiles :slight_smile: I'm going to show my analysis of the code, with the caveat that I don't know if my logic matches the compiler's logic exactly, but I think I'll be close enough to help in understanding.

Your Something struct doesn't hold any references, so there aren't any generic lifetime parameters needed on the struct definition.

The c method has a parameter that's an immutable reference to self and a return type that's a string slice. Because of lifetime elision, adding lifetime parameters to these references isn't required, but we're allowed to annotate the lifetimes if we want to (just as we could annotate types with let definitions if we want to). The elided lifetime parameters correspond to this code:

impl Something {
    fn c<'a>(&'a self) -> &'a str {
          self.a.as_str()  
    }
}

which says the returned string slice lives as long as self does.

Now, on to main-- I'm going to try to annotate the concrete lifetimes with some colored lines that I've been playing with for the as-yet-unpublished Unit 4 of the Rust in Motion video course I'm working on (shameless self promotion!)

The concrete lifetime of the Something instance owned by d starts where d is declared and lasts until d goes out of scope at the end of main, illustrated here by the solid blue line along the left:

The first time we call c, it returns a string slice reference that is valid from where we bind that reference to the variable e until the last usage of e, in the final println!. I've annotated that reference's lifetime with a blue dotted line, dotted because it's a borrowed value rather than an owned one:

Similarly, the second time we call c, it returns a string slice reference that is valid from where we bind that reference to the variable f until the last usage of f, also in the final println!. Illustrated here again with a dotted blue line:

The compiler looks at the lifetimes of the owned value and the references to it, sees that the references don't outlive the value they're referencing, and therefore declares that this code is valid because all of the references will always be valid.

If we change this function a bit we can make it more interesting and see the compiler preventing a problem.

fn main() {

    let e = {

        let d = Something {
            a: String::from("Hello"),
            b: String::from("Bye"),
        };

        d.c()

    };

    println!("{}", e);
}

Here, the concrete lifetime of the Something instance owned by d starts where d is declared and ends at the end of the inner scope where d is cleaned up, illustrated here by a solid blue line:

This code is trying to bind the reference returned by the call to c to the variable e, so the lifetime of the reference would need to last from where e is created until the last time e is used, in the final println!, illustrated here by a blue dotted line:

The compiler looks at this code and sees that the reference outlives the value it is pointing to, and the compiler rejects this code as invalid with this error message:

error[E0597]: `d` does not live long enough
  --> src/main.rs:21:9
   |
14 |     let e = {
   |         - borrow later stored here
...
21 |         d.c()
   |         ^ borrowed value does not live long enough
22 | 
23 |     };
   |     - `d` dropped here while still borrowed

What this is saying is that because d is cleaned up at the end of the inner scope on line 23, the reference in e would be invalid (known as a "dangling reference") and wouldn't point to what we want it to point to when we use e in the println!.

Whew! Did that help at all?

13 Likes

It does not "pass" anything to these lifetime parameters; it simply cross-correlates the parameters by name as input "variables" to a sat-solver that determines whether the expressed (and implied) symbolic relationships can be satisfied.

3 Likes

@carols10cents Thanks for the awesome explanation :smile:

Is this the way lifetimes are decided after adopting the NLL (non-lexical lifetimes)? i.e, from the declaration to the last usage? And are lifetimes under the NLL scheme always continuous lines? What I mean is that can there be lifetimes that aren't continuous? Like the lifetime consists of two pieces of discontinuous blue lines? This'll really clear up a few things for me.

According to my understanding lifetimes not only help to detect dangling pointers but also detect cases where two or mutable references to the same "object" sharing the lifetimes, i.e their blue lines overlap. Right? I just want to get this confirmed and use it as a base to build my understanding of the lifetimes. It's just that my ideas about lifetimes have been proved wrong many times :sweat_smile:

In the above example, according to lifetime elision, the compiler expects the lifetimes of the &self to be the same as the lifetime of the returned &str. Is that right? Now, I don't understand what is the lifetime of the &self within the method c? And should the lifetimes be exactly same? Or is there any margin for this? I've heard of subtyping but never fully understood it. And is it any different for the mutable references? OR the rules remain the same?

Thanks.

Lifetimes are always continuous

Yes, lifetimes are used to detect aliasing mutable (unique) references.


Have you read to Rust Nomicon? If not, I recommend you do as it explains subtyping and the finer details of lifetimes.

Yup! I decided not to talk about how they worked before; I wasn't sure how much of the history you were interested in.

As @RustyYato said, nope! You can have multiple lifetimes that start and end, but one lifetime doesn't consist of 2 discontinuous lines. Can you elaborate a bit on what needs clearing up with this information?

Yup, you're right! You might find this post by @Manishearth and this post by @nikomatsakis interesting on the topic of multiple mutable references.

It's more saying that the lifetime of the returned &str is related to the lifetime of &self, and the returned &str will be valid as long as self is valid.

The reference to self used in the method c becomes valid at the beginning of c and goes out of scope at the end of c. It's tied to the lifetime of self, that is, &self can't outlive self, which in this case means we can't call c without having a currently valid Something instance to call it on.

The actual time that the reference is valid can be shorter than the time that the value it's referencing is valid, but not longer.

The only difference is that you can only have one mutable reference valid at a particular point, whereas you can have multiple immutable references valid at a particular point.

Have I made things better or worse? :slight_smile: Please keep asking questions if more come up!

3 Likes

The blue lines are a fantastic idea @carols10cents!

I'm trying to understand the combination of these two statements:

The first one makes it sound like both e and f have lifetimes equal to d, whereas the second says that it's due to assigning the variable.

Would it be correct to say that we're talking about two different lifetimes here:

  1. the maximum potential lifetime for any call to d.c() - which will be equal to d
  2. the lifetime of variable assignments. In this case the assignments to e and f which happen to be bound to the return from d.c() but the variable assignment lifetime is its own thing

I'm not sure if I'm complicating things, but here's what has helped me understand this (as in "What @TomP said but with other words"):

Whenever you write down lifetimes, you're actually writing down constraints (i.e. inequalities) that need to be satisfied for the code to be correct. So when you have a function c(&self) -> &str you're writing down the constraint "The output reference can not be valid longer than the input reference". In a formal way, you're writing c(&'a self) -> &'b str, and the constraint you're putting on here is that 'b is contained in 'a ("contained" as part of the timeline, like the blue bars above). It could be written as 'a: 'b (rust notation).

The "only" job the compiler does here is collecting all those constraints and see if it can satisfy them.

3 Likes

Does the c(&'a self) means lifetime 'a is equal to self lifetime?
Or is it lifetime 'a of a reference to self?

For original code to work, it seems 'a has to be lifetime of self, since the reference &self doesn't outlive the c function execution and thus doesn't live long enough. But syntactically this lifetime looks more like a reference to self lifetime. It is confusing a bit.

This is exactly what confused me over here - and despite the epic help I got from @RustyYato (and others like @Yandros and @OptimisticPeach ) - I'm still not completely clear about it!!

Would love to see more explanations dealing specifically with lifetimes of methods and how self affects it... even this statement above raises some eyebrows for me:

So if there were generic lifetime parameters - how does that change things?

I feel like having gone through that other thread I could maybe partially answer some of this myself - but it's still murky, and would love more elaboration/explanation from seasoned Rust developers :slight_smile:

This simple question always trips me up again. I think it's because of the fundamental problem that we sometimes talk about things that are rejected (i.e. constraint systems that are unsolvable) with a vocabulary that suggests our statements are fact. But they are "impossible facts" as in "To make our system consistent, we need this fact to be true", which in turn implies that we did not necessarily get our facts wrong, but that the system is indeed inconsistent. There are 3 subtly different meanings of the word "lifetime" for me: 1) The time a certain thing exists in memory (i.e. has an owner), 2) the time a reference can be shown to be valid for (bounded by the first one), and 3) the time the variable holding a reference is active (was: lexical scope, now: some other scope determined by magic usage). And, to reiterate, finding conflicts between these 3 is borrowchk's job, and in case of a conflict it's pretty hard to talk about them when all are called "lifetime".

AAAnnnyyyways, when you write &'a self what you say is "a reference that is valid for the time 'a", which in itself is meanlingless, it only gets meaning as part of the constraint system. It's part of the constaint system to be consistent that 'a is bounded by the time self exists in memory, and that any variable holding a &'a self can only be active during 'a.

(Note: Not sure if "active" is the right word to use, but since NLL I try not to use "scope" anymore).

4 Likes

The latter.

Example

use ::core::mem::drop as do_stuff_with;

struct Foo (());
impl Foo {
    fn does_not_return_borrow<'borrow> (&'borrow self) {}
    fn returns_borrow<'borrow> (&'borrow self) -> &'borrow ()
    {
        &self.0
    }
}

fn main ()
{
    let ret3 = {
        let foo = Foo(());
        let ret1 = foo.does_not_return_borrow();
        let ret2 = foo.returns_borrow();
        do_stuff_with(ret2);
        let ret3 = foo.returns_borrow();
        do_stuff_with(ret1);
        ret3
    };
    do_stuff_with(ret3);
}

leads to:

image

The important things are:

  • the lack of lifetime parameter in output / return position of the first method leads to the first borrow of foo not being connected to the lifetime of ret1, thus ending very quickly,

    • This is typical of mutation methods since by mutating the object they usually do not need to return anything;
  • having the lifetime parameter of &self in return position leads to the second borrow of foo being connected to ret2, as well as as the third borrow of foo being connected to ret3.

  • with NLL, the borrow ends with the last usage of the variable / binding holding it:

    • the second borrow ends with do_stuff_with(ret2),

    • the first borrow ends with do_stuff_with(ret1),

      • it is allowed to overlap with the second borrow of foo, since neither borrow is exclusive (that would be a &mut borrow);
  • since ret3 is the last expression of the block, it "exits" it. But foo dies at the end of that scope, hence the red zone that would lead to a use after free bug (and vulnerability!)

3 Likes

I love this thread, it's really getting me to clarify how I explain lifetimes!

Yes, the first statement of mine you quoted here I should have said "the returned string slice is allowed to live as long as self does but no longer". I would call the lifetime meaning #1 that you have here "a generic lifetime" or maybe "a lifetime constraint" and the lifetime meaning #2 "the concrete lifetime of a variable binding".

So one 'a by itself doesn't really mean anything-- annotated lifetime parameters only really have meaning when they're used two or more times to show how references are related to each other. Annotated lifetime parameters are also a kind of generic, like a generic type parameter, but instead of saying "there will be some concrete type filled in here", they mean "there will be some concrete lifetime filled in here that meets these constraints".


Ok, so, say that the definition of Something was this instead:

struct Something<'a> {
    name: String,
    job_title: &'a str,
}

That is, an instance of Something owns a String in its name field but holds a reference in its job_title field. It also says "instances of Something can't live longer than the reference stored in job_title lives", which is pretty straightforward and you might think an unnecessary constraint to have to state. But the annotated lifetime is here to say that Something holds references somewhere, so that when you see Something used in a signature like:

fn hire<'a>(new_employee: Something<'a>) {}

you know there are references involved.

Now if Something was defined as this, it's a little bit more interesting:

struct Something<'a> {
    name: &'a str,
    job_title: &'a str,
}

This is saying instances of Something hold two references whose lifetimes are related; instances of Something are not allowed to outlive the shorter of the two references' lifetimes.

Did that clear anything up? Keep the questions coming!!

4 Likes

Note you don't have to necessarily think of lifetimes as regions. I think it is very natural (and useful!) to think of them as sets of read/write locks on the variables in a single function's body, and I believe this model is easier to reason about than lifetimes. I wrote a blog series about this at some point.

I wish I had time to get back to it! The next post in the series was going to be all about lifetime inference, which is precisely the answer to this question:

To sort of summarize what lifetime inference looks like in my model:

  • The compiler would convert a function body into some MIR-like form and unroll every loop to two iterations (so that the same borrow expression on different iterations can be distinguished)
  • A label is associated with each borrow expression (&place or &mut place) that appears in this MIR. These labels represent locks. Labels also exist for the current function's lifetime parameters.
  • Regular type inference runs across the MIR and fills in all of the type parameters of all values (even temporaries), leaving only lifetime parameters undecided.
  • Every lifetime parameter of every value in the function body, and every lifetime parameter of every function called is given a fresh "lifetime variable" (e.g. ?'0) to describe the locks it holds. A typical function body may have hundreds of lifetime variables, many of which will ultimately be redundant.
  • Each thing that happens in the function body (as well as where bounds of functions called) generates constraints such as ?'4 isSubsetOf ?'5 (meaning "every lock held by ?'5 is also held by ?'4). All constraints are of this form.
  • Lifetime inference is the process of finding the minimal set of locks that must be held by every lifetime variable. Each lifetime variable is initially assigned an empty set, except for function parameters (which hold the labels that appear in the function signature), and borrow expressions (which each hold a set with one label). The constraints are used to propagate them.

At the end, once this minimal set is found, the borrow checker can find places in the function body where multiple conflicting locks on the same thing are held, and can deny the return value from holding locks that don't appear in the signature.

2 Likes

Also, lifetimes (in the sense you've been discussing; not my "set of locks" definition) can have multiple disconnected lines for their lifetime. At least, in textual form. The set of nodes on the MIR (which has a graphlike structure) must still be connected.

fn main() {
    let mut found = None;
    let mut vec = vec![0, 2, 4];

    for i in 0..vec.len() {
        if i == 2 {
            found = Some(&mut vec[2]);    // |
            break;                        // |
        }

        // safe to use vec here, thanks
        // to the unconditional break!
        println!("Printing: {}", vec[i]);
    }
                                          // |
    *found.unwrap() += 6;                 // |
    println!("Mutated: {:?}", vec[2]);
}

The comments on the right margin indicate the lifetime of the mutable borrow.

2 Likes

That is some tricky flow control! I might point out that when this loop is unrolled, you will actually see the same continuous lifetime as the other examples.

1 Like