Common mis-explanation about Foo<'a> notation?

I see statements like the following all over the internet. I think this is a misconception, which I found confusing, so I'm asking if I have this right.

Here's an example from a recent blog post:

    struct SubstringMatch<'a> {
        start: usize,
        text: &'a str,
    }

The compiler will force you to spell out the lifetimes, commonly <'a> by convention, which in this case dictates that the entire SubstringMatch struct can only be alive as long as the &str in the text field is.

I don't think it's true that Foo<'a> means "the Foo struct can only be alive as long as 'a is". It's simply a statement that Foo is generic on a lifetime called 'a.

The explanation is confusing because it implies you could choose to write a struct that wasn't limited by its contained reference lifetimes, simply by not mentioning them in the <>, which is impossible.

Structures inherently have to have a lifetime longer than any references they contain. It's not the contents of the <> notation that makes that so. Of course, it's not incorrect to say that Foo<'a, 'b> has to outlive 'a and 'b, but that's not because of the <'a, 'b> itself, it's because of the references inside Foo. It's a good rule of thumb for reading Foo<'a, 'b> to notice that it must outlive 'a and 'b, but that's not what it means.

As a counterexample, I can put a &'static in Foo and not have to write Foo<'static> (in fact that's invalid). But every Foo's lifetime must nevertheless be 'static.

I, at least, found the notation much easier to understand once I realized this, so I wanted to confirm this is accurate, and if so, ask people writing about Rust not to propagate this explanation. What do you think?

Types with lifetimes are only valid within that lifetime, mostly, so it's a useful statement in a general sense. ("Mostly" because it's possible for the destructor of a type to not require the lifetime to be alive, for example trivial destructors such as those of references do not require their lifetime to be alive.)

Depending on the context[1] and one's perspective, it can accurate to say the use of a value keeps the lifetime (and the borrow) alive, instead.

These statements all seem backwards? I can put a &'static _ in a local that definitely doesn't outlive 'static.

If you meant "the type will meet a : 'a + 'b" bound, i.e. "meet a : 'a bound and a : 'b bound", that's also incorrect. A type parameterized by 'a and 'b can only meet a bound in the intersection of the two lifetimes, not the union.


  1. namely when the lifetime is inferred β†©οΈŽ

2 Likes

I think it's safe to say that syntactically, Foo<'a> is generic over a lifetime, and semantically, Foo<'a> cannot outlive the string it is referencing. Both statements are true from slightly different perspectives.

What do you mean by this? Both are valid: Rust Playground

Bar, which contains an internal 'static lifetime is less flexible than Foo<'a>. Because Foo<'a> can reference temporary strings, and Bar cannot: Rust Playground

@quinedot Yes, sorry, those lifetime statements were backwards β€” of course a struct has shorter lifetime than any references it contains. I can only say the coffee hadn't kicked in yet. :slight_smile:

But my point is just that the relationship is inherent in the struct, not something introduced by the <> syntax. Of course a struct can't be allowed to outlive the references inside it. That's not because of the <'a, 'b> on the struct.

It's like saying a function foo<T>(T x) takes a type T because it has a <T> on it. No, it takes a type T because it has an argument of type T. The <T> is just there to lexically introduce the generic name to be used in the argument. Similarly, a struct can't outlive 'a because it contains a &'a, not because it has <'a> on it.

@parasyte I meant this is invalid:

struct Foo<'static> {
    ...
}

1 | struct Foo<'static> {
  |            ^^^^^^^ 'static is a reserved lifetime name

If Foo<'x> was really a directive about Foo's lifetime, then you'd expect this to be valid.

Perhaps what's confusing to me is explaining what is really just a syntactic feature as if it had semantics of its own. It introduces redundancy in the explanation which makes the causality harder to see. (Or perhaps I still don't really understand this! :slight_smile:)

struct Foo<'a> is a declaration of a generic parameter name. Just like when implementing, the declaration of the parameter name occurs in impl<'a>:

impl<'a> Foo<'a>
//   ^^      ^^
//    β”‚       β”‚
//    β”‚       └── Use of generic parameter
//    └────────── Declaration of generic parameter

The error message you are pointing out as invalid is syntactically trying to give a generic parameter a reserved name.

@parasyte Yes, that's exactly what I'm saying. Read the original statement from the blog where it says the <'a> dictates what the lifetime is. It doesn't dictate anything, it just introduces an identifier to be used later in something meaningful (namely, a reference's lifetime).

The lifetime dictates validity, though.

Perhaps you are battling the common terminology overload problem. The lifetime annotation and liveness scope are different concepts but can commonly be confused and used interchangeably.

1 Like

No, I get that difference. I'm really battling the conflation of generic identifier introductions (which is the same syntax <x> everywhere β€” structs and functions) with the semantics of the uses of the identifiers. For me at least, this text from The Book seems not completely accurate, and this may be the source of all the others:

struct ImportantExcerpt<'a> {
    part: &'a str,
}

As with generic data types, we declare the name of the generic lifetime parameter inside angle brackets after the name of the struct so we can use the lifetime parameter in the body of the struct definition. [Accurate.] This annotation means an instance of ImportantExcerpt can’t outlive the reference it holds in its part field. [Inaccurate. It is the part: &'a that does that, not the <'a>.]

I agree with you and updating that para would be nice. However, note that you can't declare a lifetime annotation on the struct unless you use it for one of the fields (you'll get a compilation error), so the decl does imply that it takes effect (assuming the code compiles).

I can understand that point of view, but note that outlives relationships are defined syntactically. Consider this example:

trait Trait {
    type Gat<'a>;
}

struct Example<'a, T: Trait>(T::Gat<'a>);

impl<'a, T: Trait + 'static> Example<'a, T> where for<'any> T::Gat<'any>: 'static {
    fn example(self) -> Box<dyn Any> {
        Box::new(self)
    }
}

It fails because the 'a parameter on Example doesn't meet a 'static bound, even though the types of all of the fields do.

One can imagine a language where Example<'a, T> is valid outside of 'a, but it isn't the case in Rust.


As for the book...

This annotation means an instance of ImportantExcerpt can’t outlive the reference it holds in its part field.

...this could be updated to be more thorough/accurate. But frankly the book conflates Rust lifetimes and liveness scopes to such an extent that this is a minor side note compared to the inaccuracies elsewhere. The borrowing chapter needs an overhaul IMNSHO.

Your argument makes sense on the one hand: The presence of a generic parameter declaration isn't what restricts validity of the struct, but the actual reference contained inside does. Ok, I can see that perspective, and I see how that conclusion came about.

But on the other hand, the generic parameter is the contract by which the compiler makes its validity proofs. There are cases where the lifetime parameter is not used in any real reference at all (PhantomData being quite common to "represent" a borrow without actually holding one).

I posted in another thread a while back where a very similar discussion was taking place: Borrow checker and structs containing references - #8 by parasyte And I want to bring up the sources that I linked in that thread again, because they are very relevant here:

Lifetime notation Β· baby steps describes the initial motivation for introducing this syntax. I think it does an excellent job providing the stark contrast between how things were before the generic parameters were introduced, and where we are today.

Reading the historical context is one thing but knowing that the generic parameter is fundamentally how the borrow checker operates with structs is really quite important. The reasoning for the proofs has pretty much always existed, but the syntax makes it explicit for readers. And provides the ability for structs to contain multiple disjoint references.

And Lifetime notation redux Β· baby steps brings it up to nearly the modern syntax.

My thoughts here are that the declaration of the lifetime parameter has everything to do with instance validity requirements. And that's because we don't have any other tools to do the same job.

1 Like

Thanks for the links, both of you! I wish the book and reference would link more often to the relevant RFCs, as they seem often to be the only record of the true details of what is happening. And I love historical context!

BTW, I did look in the Reference, and I was surprised to find that (as far as I could tell), while it refers to lifetimes at least 20 times, it never says what a lifetime is. It's not even in the glossary. I know it's a work in progress, but that seems fairly basic (though maybe this thread just demonstrated it's by no means simple…).

1 Like

Oh, and I forgot to say β€” the PhantomData example, to me, reinforces my argument. If the mere presence of the generic parameter was sufficient to enforce the lifetime, you wouldn't need to put in the PhantomData field. No? This is even what the docs for PhantomData say:

The intention is that the underlying data is only valid for the lifetime 'a , so Slice should not outlive 'a . However, this intent is not expressed in the code, since there are no uses of the lifetime 'a and hence it is not clear what data it applies to.

(emphasis mine)

You are right, the presence of the parameter on the declaration is insufficient. But it is necessary for users. Consider that private fields can hold the reference (or the representation of the reference, as in PhantomData).

Looking at the [public] docs for the following declarations would not be able to inform me that StringMatch temporarily borrows its input, nor which input it borrows:

// Note the lack of a lifetime parameter.
pub struct SubstringMatch {
    // Private fields will not be documented by default
    start: usize,
    text: &str,
}

// Which input is borrowed by the return value?
pub fn substr(haystack: &str, needle: &str) -> SubstringMatch {
    todo!()
}

With the syntactic annotation, it all becomes very clear. No one has to read the code to make the discovery. It is all apparent at the type-level.

pub struct SubstringMatch<'a> {
    start: usize,
    text: &'a str,
}

// Disambiguation is a Very Good Thing, actually!
pub fn substr<'a>(haystack: &'a str, needle: &str) -> SubstringMatch<'a> {
    todo!()
}
3 Likes

The reason type and lifetime parameters must be used[1] is so that variance can be inferred and bivariance rejected.[2] But variance doesn't effect outlives bounds.

See my previous example where the generic parameter enforces a lifetime even though the field meets a static bound.


  1. or constrained β†©οΈŽ

  2. unless constrained β†©οΈŽ

I feel like I'm starting to drag this on too long, but…

@parasyte I get what you're saying, but notice that you took 'a off of the reference as well. I would say it's the text: &'a str, the haystack: &'a str that tells you what you need to know. But of course, the <'a> is essential to the meaning of those declarations β€” here's my take on that.

From my still potentially flawed viewpoint (because I haven't fully grokked @quinedot's example), the <'a> is "only" there to lexically disambiguate the identifier in nested declarations. In other words, this would work fine:

pub struct SubstringMatch { // no <>
    start: usize,
    text: &'a str,
    another_field: &'b str,
}

Just by definition, we know text and another_field have lifetimes, and they must outlive the struct. The actual new information we're declaring is that those fields are allowed to have different lifetimes. Which you can see because they have different names.

But you need to be able to associate the lifetimes with those of a higher-level item:

// alternative made-up notation to hook up the lifetimes
pub fn substr(haystack: &'h str, needle: &str) -> SubstringMatch('a = 'h) {
    todo!()
}

(Which kinda takes us back to your 2012 post.) So the <'a, 'b> lets us do this mapping in a more convenient way using the same mechanism as the other generic parameters.

But @quinedot's example may be telling me that's an oversimplification.

A lot to think about. Thank you for being so generous with your time!

Yes, because it's irrelevant. To wit, the documentation would not include a field named text, so you wouldn't be able to see whether it has a lifetime name or even a borrow at all.

The reason I bring up documentation here is to help build an intuition for how the compiler is doing its analysis. [1] A reader can do something very similar by looking at type declarations. Removing the <'a> removes the essence.

Said another way, the constituents of the struct do not matter as much as its generic parameters. And that, I claim, is why PhantomData can substitute any "real" reference. The compiler treats them identically for the purposes of the borrow checker. Therefore, we can ignore all struct fields and glean enough meaning through its generic parameters alone.


Ninja edit: I also recall being very confused by lifetime annotation names when I was initially getting used to the language. Admittedly, it took way too long for me to realize that the name given in a struct definition was entirely unrelated to the name given in an impl block. You can name these anything you want! It doesn't have to be 'a in both places:

// I name the lifetime `'a` here...
pub struct SubstringMatch<'a> {
    start: usize,
    text: &'a str,
}

// But I name it `'b` here!
pub fn substr<'b>(haystack: &'b str, needle: &str) -> SubstringMatch<'b> {
    todo!()
}

What matters is that the syntax can express how the lifetime is "wired up" when calling functions that use them. The made-up syntax you provided brought me back to those earlier days, when I finally understood what it means that naming really is hard.


  1. Naively and grossly inadequate, to be sure. But I feel like this conversation is still somewhere in syntactic land, and not semantic land. β†©οΈŽ

1 Like

A small point to add to what @parasyte just said:

As you may know, the Rust compiler, and therefore the mental model we must keep in mind when coding, only uses function declarations to do type checking, including lifetime checking, in code that calls those functions. It doesn't look inside the functions, and knowing this really helps understand its limitations and error messages.

In the same way, the compiler only looks at a struct's public declaration, not including fields, to do its type checking of code using that struct. So therefore our mental model should take this into account as well. In this sense, the fact that a field in the struct has a reference is something we don't need to know. OTOH, to understand lifetimes overall we do need to know that something in that struct uses the lifetime in the struct declaration, which is your point I believe.

2 Likes

It relies on various things which are invisibly based on the fields.

  • variance
  • inferred outlives bounds
  • auto traits (including Sized)
  • drop glue (relevant for borrow check)
1 Like

Good to know! Thanks for the correction. It's always more complex that I thought it was.