Lifetimes in Structs

What is the relation between the Person struct and name (&'a str)? what intuition should one keep in mind in general about references and lifetimes when writing code that deals with them?

pub struct Person<'a> {
    name: &'a str,
    age: i32,
}

As the rule one would not create a struct that looks like that.

Because name here is a borrowed string (that someone else owns) and then Person is someone who tracks one (or more) such borrows.

Sometimes that is exactly what you need, but like in real life you don't borrow when you can own, instead.

From the book, my emphasis:

This struct has the single field part that holds a string slice, which is a reference. As with generic data types, we declare the name of the generic lifetime parameter inside angle brackets after the name of the struct so that we can use the lifetime parameter in the body of the struct definition. This annotation means an instance of ImportantExcerpt can’t outlive the reference it holds in its part field.

ImportantExcerpt being Person and the part field being name in your example.

I guess there could be 2 reasons to do this. One is to prevent a copy. Of course, you should not create a str and just add it to a struct. Usually, you are borrowing it from a different type and while a valid reason, the benefits are not huge and also situational.

The other reason, which I feel is more important, is the implication that the original type cannot be dropped before dropping the struct that borrows from it. Of course, this can be achieved by using PhantomData too. You don't have the actually borrow something real.

If you are not sure of the benefits, it's better to avoid lifetimes in structs.

You have algorithms and data structures. References are part of algorithms so if your not defining details of an algorithm you should generally not have any. (&'static and static Arc/Rc don't count.)

It's a design error.

It says that Person doesn't store the name. The name isn't a string, but a string view. The data for the string view is borrowed and the view must be temporary.

This virally changes Person from being a normal struct to being a temporary view itself. It irreversibly infects everything using the Person<'temporary_data_not_stored_in_here> with the requirement to restrict usage of all such types to only the limited temporary scope their data has been borrowed from, with absolutely no way to extend that scope. Nothing will ever be allowed to be used outside of the scope of the loan.

This mistake is a recipe for fighting the borrow checker. Do not put references in structs.

It's a misconception that this is needed to prevents copies. Owning strings are moved without copying the data. In most cases where this borrowing mistake happens, the user already has an owning string to use, they just can't move it into the right place due to having types declared incorrectly. Generally Rust can't store string data directly on the stack, but borrowing from some String will make the view struct pointlessly tied to some variable on the stack (which directly has no data other than the pointers).

There are lots of techniques for handling strings without copying where it matters (interning, Rc<str>, borrowing in function arguments and actual temporary views of data where appropriate) but they don't apply to the basic Person example.

Ownership in Rust is a separate concern from copying or even performance. You have to get ownership right to get the program to compile. You can't ignore ownership or declare it incorrectly in some vague hope of things being somehow faster. To the compiler borrowing is a correctness feature. When the compiler sees references it interprets it as asking to restrict usage of the data, forbid escaping of such pointers, and freeze source of the loan for as long as the reference may be used. That's the primary functionality of Rust's references, and if that's not what you want to happen, then you just can't use references.

Note that storing and passing "by reference" doesn't need Rust's (temporary) references. Moving String passes text data by reference. Box<str> has identical representation as &str, and they differ only in ownership.

6 Likes

The Rust book says:

One detail we didn’t discuss in the “References and Borrowing” section in Chapter 4 is that every reference in Rust has a lifetime, which is the scope for which that reference is valid. Most of the time, lifetimes are implicit and inferred, just like most of the time, types are inferred.
Validating References with Lifetimes - The Rust Programming Language

So every reference has a lifetime. Sometimes the lifetime is elided and doesn't have a name. Sometimes you (have to) give it a name.

For struct, Rust requires that lifetimes have a name. This is what is happening here. The struct contains a reference and the lifetime of that reference is now named 'a.

I guess the reason why Rust demands a name for a lifetime is, that you might later need it to have a name.

Imagine a function (first without lifetime annotations)

impl Person {
    fn new(name: &str, some_text: &str) -> Person {
        ...
    }
}

How would the compiler know from the function signature against which parameter lifetime the lifetime of the reference in Person should be checked?

As the lifetime has the name 'a you can use that name to bind those lifetimes:

impl <'a> Person<'a> {
    fn new<'other>(name: &'a str, some_text: &'other str) -> Person {
        ...
    }
}

Now the compiler knows that the reference in the returned Person can't outlive the name (... what name refers to)

1 Like

The type of elision that the book is talking about in the quote applies to structs too. You have to give the lifetime in the struct definition, but not necessarily elsewhere. Depending on the exact situation, you may be able to use <'_> or just completely elide the lifetime.

// This will give a warning but will still compile.  The warning
// is pretty new; it used to be silently accepted.
fn elide_away(name: &str) -> Person { ... }

// This is the preferred way to write it.
fn indicate_but_do_not_name(name: &str) -> Person<'_> { ... }

// This is the explicit version.
fn name_it<'s>(name: &'s str) -> Person<'s> { ... }

// You're not allowed to completely elide the lifetime here,
// but you still don't have to name it.
impl Person<'_> {}

With references, you always have & to clue you in that there's an elided lifetime. With structs, you don't, which is why Person<'_> is preferable to just Person.

Your new example doesn't allow elision in the return type because it's ambiguous. The same would be true if you returned &str; it's not a struct specific thing. And this still doesn't work:

impl<'a> Person<'a> {
    fn new<'other>(name: &'a str, some_text: &'other str) -> Person {

But this does:

impl<'a> Person<'a> {
    fn new<'other>(name: &'a str, some_text: &'other str) -> Self {

Because Self is an alias for Person<'a> specifically here.

More about lifetime elision in function signatures.

Some people wish you didn't have to name lifetimes in the struct definition too, but others feel (rather strongly) that knowing about the lifetimes is too important to be invisible in that context. It's also unclear what the meaning of multiple elided or unnamed lifetimes in the fields would mean (distinct lifetimes or all one lifetime), it could make field reordering a breaking change, etc.

5 Likes