Semantics/Workings of explicit lifetimes


#1

Hello.

I try to get a deeper understanding of the lifetime concept but i am a little confused with references explicit lifetime annotations. I got the concept, what they are needed for and how one should use them, still i am not totally sure about the semantics: What does the lifetime 'a in a expression like fn as_str<'a>(data: &'a u32) -> &'a str (an example from the rustonomicon) actually mean? (Example from the Rustonomicon) I got several explanations from different sources:

  1. The Borrow checker documentation insinuates to me, that 'a actually is a name for the scope of the value, to which the reference foo is pointing. When an assignment of the returned reference is made e.g. x = as_str(data), it is checked, that the scope of x (distinct from its associated generic lifetime) is smaller than 'a (This seems to me to be the most likely explanation, still i want to verify my opinion, as the doc is rather lengthy and complicated for me)
  2. The Rust Book 2nd edition suggests, that those lifetime parameters have no meaning for them alone, but just express relations between the lifetimes of references. The actual meaning is inferred from the stated constraints. Using the phrasing of the book i woulds translate declaration constraints into "the returned value and the input argument data will both live at least as long as 'a". To me, this is a little ambiguous, especially the “at least” part is confusing. From the additional explanations, i would understand, that the returned reference gets the lifetime 'a assigned, which is inferred from the input (´data´ in the example) lifetimes . It is still a little unclear, whether the “input lifteime” refers to the scope, where the given references are have been declared, or to the scope of the pointed-to values
  3. The Nomicon ( doc.rust-lang .org/nomicon/lifetimes.html), as i read it, states, that the lifetimes of references are inferred by the borrow checker, as the minimum time, for which the borrowed value is used. In this case i am totally unclear how lifetimes are measured in this case (ins copes!?) and how this inference would work. (I get this understanding from the section which at the beginning, where ‘desugared’ examples are presented). I would see this as contradicting to variant 1
  4. A blog post (medium .com/@bugaevc/understanding-rust-ownership-borrowing-lifetimes-ff9ee9f79a9c) suggests, that “References, among other objects, have lifetimes too, and those can be different to the lifetime of the borrow they represent (called the associated lifetime). […] A borrow may last longer than the reference it is controlled by”. The distinction between the lifetime of the reference and 'a as its associated lifetime makes sense to me. Still this post suggests, that the checker somehow infers a lifetime of a borrow. Especially, if a create a new reference from a previous one (e.g. let x=3; let newRef; { let oldRef = &x; let newRef = oldRef;} ...), it tracks some kind of “borrow” object, which is the same for oldRef and newRef. This notion not too clear to me, especially for more complicated seetings. But despite of that, it also suggest, similarly to the Nomicon, that the “Rust compiler tries to make the borrow lifetime as short as possible” and infers a lifetime of the borrow (suggesting kind of a reverse checking by propagating back the liftime of the usage of the returned value to the input arguments and validating that they live long enough.

Therefore, my question: What is the actual concept? Maybe even a different one? I hope you understand my confusion from the stated explanation. It is not, that i do not get in principle, why explicit lifetimes need to be declared, and how to use them, but rather on understanding, how they work…
(P.S.: sorry for the blanks in the links, but as a noob i am sadly not allowed to use more than two links…)


#2

A better, IMO, explanation that I’ve seen is http://arthurtw.github.io/2014/11/30/rust-borrow-lifetimes.html. This is a notoriously difficult topic/concept to explain and the overload of terms across various resources doesn’t help. I do think it’s important to draw the distinction between a value’s lifetime/scope and a borrow’s lifetime/scope - they’re not the same thing necessarily (and most often aren’t). I think the blog above does a good job trying to differentiate the two.


#3

Hi, I’m the author of the article from your variant 4, and evidently my explanation isn’t good enough, so here’s another attempt:

There are two sides to the fn as_str<'a>(data: &'a u32) -> &'a str story: the callee side and the caller side.

To the callee, 'a is just some lifetime. It’s any lifetime. Whatever lifetime the caller wishes to use as_str with, they can, provided they have a valid &'a u32 reference to pass as an argument. This is to say that as_str is generic over 'a, the same way fn foo<T>() is generic over T. You can read fn as_str<'a>(data: &'a u32) -> &'a str as “whatever lifetime 'a is, if you hand me an &'a u32, I’ll give you back an &'a str — a &str that lives [at least] as long”. Most importantly, the callee doesn’t get to choose what 'a is, exactly; that decision is made at the caller side.

(to be continued)


#5

Thanks, bugaevc for the reply.
I got some C++ background and am familiar with generics. I understand, that a specific lifetime gets filled in by the caller at the calling site. Although my problem with the lifetimes is to figure out, which substitution actually occurs. Consider a typed example

struct Foo {value : T,} 
let foo = Foo { value : 5}

In this case T gets substituted as “integer”, because 5 is a integer. But in the case of lifetimes, i do not quite get the grasp, of what the equivalent of “integer” is. Often it is written, that one can reason about lifetimes as scopes of variables. Maybe i simplify the question for a while with the following example:

fn as_str<'a>(data: &'a u32) -> &'a str // imagine its proper definition, not important here
{ //let me refer to this scope as scopeA
    let the_data = 123;
    { //le me refer to this scope as scopeB
         let y  = &x;
         { //scopeC
               let theString : &'b str = as_str(&the_data);
         }
    }
}

What in the end gets plugged into the generic lifetime parameter 'a of the function? scopeA, as it is the time for which the_data is alive, to which y is referring? scopeB, because it is the time for which actually y is alive, or rather scopeC, because the reutned reference is alive for that time?


#6

(continued)

Now, on the caller side, the caller gets to pick what 'a it wants to use fn as_str<'a>(data: &'a u32) -> &'a str with. It used to be possible on older versions of Rust to even supply an explicit lifetime like so:

fn caller<'b>(input: &'b u32) {
    as_str::<'b>(input);
}

in this case, we say explicitly that we want to use as_str() with 'a being set to our 'b (which is, in turn, picked by whoever calls the caller).

But most of the time, you do not specify 'a explicitly; instead, it gets inferred by the compiler. The way the function is used constrains what 'a can be: [the following is a simplification] parameters’ lifetimes need to live at least as long as 'a, and 'a needs to live at least as long as the function’s result. If that result is passed to some other function, there will be more constraints, and so on. Out of all possible 'a that satisfy all of the constraints, the compiler picks the shortest one. That’s it.

If you reason about a particular piece of code in this way, you can infer what actual lifetime 'a means just like a compiler does.

When an assignment of the returned reference is made e.g. x = as_str(data), it is checked, that the scope of x (distinct from its associated generic lifetime) is smaller than 'a

Indeed, the scope/lifetime of x itself must be always smaller or equal than its associated lifetime (to quote myself, "A borrow may last longer than the reference it is controlled by”), which in turn must be smaller or equal than 'a that gets inferred (since 'a itself is an associated lifetime of data the callee argument), which in turn must be smaller or equal than the associated lifetime of data the caller variable that gets passed as data the argument.

those lifetime parameters have no meaning for them alone, but just express relations between the lifetimes of references

They do indeed express the relations/constraints between lifetimes, but to say that they have no meaning to themselves may be an overstatement. Would you also say that in fn transform<T>(t: T) -> T that T has no meaning and just expresses the relationship between the input and output types? This largely depends on the point of view; as far as the caller is concerned, yes, it just says that the types are the same, but for the callee T is an actual, although kind of unknown, type.

Or not. I’d say it’s both kind of a type and kind of a constraint/relation. That’s how generics work, though, nothing specific to lifetimes here.

“the returned value and the input argument data will both live at least as long as 'a”

From the callee’s point of view, yes. This is valid, for example:

fn callee<'a>(foo: &'a str) -> &'a str {
    "bar"
}

The string literal "bar" has the lifetime 'static, which is longer than 'a (or the same as 'a, if 'a is static itself).

But from the caller’s point of view, the argument it provides must live at least as long as 'a, but the return value of callee may not outlive 'a (i.e., its lifetime is shorter or equal than/to 'a). E.g., this is valid:

let my_ref = &35;
{
    let res = callee(my_ref);
}

But then again, this is how subtyping works, it’s not specific to lifetimes in particular. Check out this piece of Java for comparison:

class Foo {}
class Bar extends Foo {}

// ...

<T, T2 extends T> T callee(T arg1, T2 arg2) {
    return condition() ? arg1 : arg2;
}

// caller:
Foo res = callee(new Bar(), new Bar());

Here, the T will be inferred to be either Foo or Bar (Java will probably require you to disambiguate), but from the caller side, the type of the res must be either T or broader, whereas from the callee side the type of the return value must be either T or “narrower” (its descendant).

it tracks some kind of “borrow” object, which is the same for oldRef and newRef. This notion not too clear to me, especially for more complicated seetings.

I never said there is a “borrow object”. I keep talking about “the lifetime of the borrow”, but what I mean by that is that a variable is borrowed for that lifetime. In your snippet (I removed the extra let),

let x = 3;
let newRef;
{
   let oldRef = &x;
   newRef = oldRef;
}

the borrow of x (there’s no object, just the fact that x is borrowed), as held by oldRef and newRef, lasts from the &x up until the end (where oldRef is dropped).


#7

I don’t quite understand what are x and y and how they are related to the rest, and what is 'b? It’d perhaps be better if you provided an actual example (that compiles) that you’re unsure about.


#8

My non-expert understanding:

I will assume you meant (note: I also added some additional lifetimes for the sake of discussion)

fn as_str<'a>(data: &'a u32) -> &'a str // imagine its proper definition, not important here
{ // scopeA
    let data = 123;
    { // scopeB
         let borrow  = &'b data; // CHANGE: data instead of x
         { // scopeC
               let string : &'d str = as_str::<'c>(borrow);  // CHANGE: the borrow from scopeB instead of a new one
               return string;
         }
    }
}

When the borrow checker reads this it forms a number of constraints (this is just my mental model; it may be off-base!)

Note: lifeof(x) is the lexical region from the declaration of x to the point in time when it is deallocated (in reverse order at the end of its scope).

  • lifeof(data) < scopeA
  • 'b < lifeof(data) due to the definition of borrow
  • When as_str is called recursively, it gets a fresh lifetime argument 'c.
  • 'c <= 'b (when borrow is supplied as an argment)
  • 'd <= 'c (when string is assigned).
  • 'a <= 'd (when string is returned).

(note: the last three are inequalities rather than equalities because variance permits coercions of a borrow to a shorter lifetime)

And then it tries to prove that these constraints can be satisfied for all possible 'a, since that’s the claim our signature makes. It fails because, piecing together the inequalities, we have that 'a < scopeA, which would fail if 'a was 'static (i.e. the unbounded lifetime).


#9

Sorry for tha bad code example. I saw, that you started typing, @bugavc and for that reason tried to hurry up, to avoid unnessecary work. ExpHPs idea of my request was close. In order to make the thread more meaningful, i give a delayed compiling example (i reworked the as_string function, because the example function from the Nomicon was a example of how things don’t work):

struct Foo {
	x: i32,
}

fn return_some_ref<'a> ( foo : &'a Foo) -> &'a i32 {
	&foo.x
}

fn main() 
{//scopeA
	let data = Foo{x:123};
	{//scopeB
		let borrow = &data;
		{//scopeC
			let returnval : & i32;// for discussion: &'b i32
			returnval = return_some_ref(borrow);
		}
	}
}

So the question on that was, which scope will be substituted for 'b.
After going over your elaborations and @vitalyd s link (which i liked a lot, thank you), i would guess, that it actually scopeB is the one, which explains the final meaning of 'b. Actually maybe that question is actually wrong. It comes from the notion, that lifetimes indeed are generic parameters, and there needs to be something, which i can substitute them for. Although the checking might truly work with inference based on the symbol ´’a´ itself (similar to ExpHPs intuitive understanding). This in the end comes back close to the opening question: is rather a substituion done for the checking (if so, which one) or some kind of inference?


#10

It’s important to also know that lifetimes have variance/subtyping rules. Namely for immutable references, a longer lifetime is a subtype of a shorter one. So your returnval has its own scope, call it C there. return_some_ref returns you a reference of a longer lifetime, but that’s fine because of the subtyping relationship. Sometimes we refer to this as “squeezing/shrinking” the lifetime. Note that with mutable references such subtyping does not exist (called invariant) - it’s not ok to substitute a longer lived lifetime for a shorter one.


#11

(Note, this describes my mental model, not necessarily the actual semantics)
I prefer to call the <'a> thing (on both functions and structs) the borrow context.

The borrow context consists of two things:

  • Which variables are borrowed
    • and whether mutably or immutably
  • When does the borrow end.

I’ll use the following notation:

  • 'a is a generic borrow context, like in &'a u8
  • '{x borrowed mutably until line 27} is a specific borrow context, like &'{x borrowed mutably until line 27} u8.

So, let’s desugar your example:

fn return_some_ref<'a> ( foo : &'a Foo) -> &'a i32 {
    &foo.x
}

fn main() 
{ //scopeA
    let data = Foo{x:123};
    { //scopeB
        let borrow: &'{`data` borrowed immutably until end of ScopeB} Foo = &data;
        { //scopeC
            let returnval: &'{data immutably until end of ScopeB} i32;
            returnval = return_some_ref<'{data immutably until end of ScopeB}`>(borrow);
        }
    }
}

(Sidenote about Non-lexical-lifetimes – after NLL the until end of ScopeB will change to until end of ScopeC, because now it matters how long borrow is in scope, after NLL what will matter is the last actual usage of it).

I think what’s most important is what is borrowed, so I’ll omit the how long part from now.

Now a funnier example with a function with overrestrictive bounds:

fn foo<'a>(a: &'a mut u8, b: &'a u8) -> &'a u8 {
    *a += 1;
    b
}

fn main() {
    let mut x = 0;
    let y;
    let res: &'{x borrowed mutably and y immutably} u8 = foo(&mut x, y);
    // println!("{}", x); // error
}

See how res “knows” about all of these borrows! That’s handy, because you can create a struct with multiple references, but just one lifetime parameter:

struct Car<'a> {
    model: &'a str,
    make: &'a str,
}

fn main() {
    let model = "Corolla".to_owned();
    let make = "Toyota".to_owned();
    let car: Car<'{model and make immutably borrowed}> = Car { model: &model, make: &make };
}

Let’s go with one more example that illustrates what’s valid inside the function:

fn foo<'a, 'b>(a: &'a u8, b: &'b u8) -> &'a u8 {
    let x = 5;
    match whatever {
        // We only know about the return value that it may borrow from
        // the context 'a. So let's every time ask ourselves a question,
        // is that particular return value valid given only that assumption? 

        A => a, // valid, a is by definition valid in the context 'a,
        B => b, // invalid, we don't know if borrows from context 'a 
                // are sufficient to make b valid
        C => &5 // (this is &'static u8)
                // valid, &'static reference is always valid
        D => &x // invalid.
                // Borrow of `x` would be required to make it valid,
                // but `x` is not in context 'a (which is chosen by caller)
                // (`x` also lives too short, as 'a may be as long as caller chooses).
    }
}

#12

Actually maybe that question is actually wrong. It comes from the notion, that lifetimes indeed are generic parameters, and there needs to be something, which i can substitute them for. Although the checking might truly work with inference based on the symbol 'a itself (similar to ExpHPs intuitive understanding). This in the end comes back close to the opening question: is rather a substituion done for the checking (if so, which one) or some kind of inference?

Hmm. This is a very nuanced question. I’m not even entirely sure I understand what the difference is!

I just now had the thought to look at the documentation of libborrowck myself (as linked in the OP), and I’m surprised at all of the strings that I cannot locate any instances of by Ctrl+F:

generic
param[eter]
constra[int]
subst[itute]

and the answer probably is in there somewhere (even if it’s just strongly implied), but too much of it is in an alien language to me that I can’t digest it well enough to figure it out.


#13

Lifetime parameters are substituted with concrete scopes automatically by the compiler. So whereas type generics allow you to specify the types, lifetime parameters only let you specify their relationships (or constraints) - you don’t pick a specific one manually, compiler does it. There are rules around various types of references (mutable vs immutable), paths of a reference (eg borrowing a struct mutably freezes it and its fields, or anything reachable from it to be exact whereas borrowing fields individually is ok), and the compiler enforces all of these. I think the compiler also sometimes infers regions for a reference, most notably with closures taking references as arguments, and those sometimes require weird hacks to make the compiler satisfied (because it may infer or assign the wrong lifetime).


#14

Let’s see. There are three constraints:

  • data lives for scopeA, and we borrow and pass it as an &'a Foo, so scopeA >= 'a
  • we save the return value of the function, which is &'a i32, into a &'b i32, so 'a >= 'b
  • the variable returnval lives for all of scopeC, so 'b >= scopeC

These constraints mean that scopeA >= 'a >= 'b >= scopeC. Out of all possible 'a and 'b that satisfy this, the compiler picks the smallest one, which in this case is: both 'a and 'b get substituted for scopeC.