Are lifetimes in structs an anti-pattern? Resources for learning more about ownership, borrowing and how (not) to structure yor data in Rust

Hello everyone,

I'm new to Rust and am rather fond of the language for the most part.

However, one of the stumbling blocks I have encountered, are lifetimes in structs.
The concept of lifetimes makes sense to me and I (think) I can reason about them in my program as long as they are only used in functions.

But, having a struct with lifetimes causes a lot of headaches and after hours of failing to make the borrow checker happy, I've decided to look for a different approach.

I've read in multiple threads in this forum that structs with lifetimes are generally considered a bad idea/beginner mistake/anti-pattern.

I have studied the Smart Pointers chapters in the Rust book, but am looking for resources that help me develop a deeper understanding of Rust's ownership model and how to approach designing my data structures. I don't want to just naively replace all my &'as with Rc<...>s without a more fundamental understanding of the language and how to use it.

Can anyone recommend some books (hard or soft copy), articles, video lectures or even code repos on the topic?

Thank you for your time

2 Likes

The key to understanding is two-fold:

  1. You should realize that lifetimes are not magic; "lifetimes in structs" is not a separate concept, and is not different from "lifetimes in function" or "lifetimes in enums" in any fundamental or deep way.

    The only real difference is that when you define a lifetime parameter on any user-defined type (not just structs, but enums and unions as well), it can't be inferred the way it can when it's on a function item, because there is simply no context to infer it from.

    A lifetime on a UDT ultimately just means that your type contains a reference, either directly or transitively, and you should basically treat it as if it were a reference itself.

  2. References don't own values, and so if you need your UDT to own the contained data (the most mainstream use case), you can't use references for that, and you don't need any lifetimes.

    You will need references (and thus, lifetimes) on your UDT if you want your UDT to be a view or "manager" of some sort, operating on already existing data. Examples when you need a lifetime annotation on a UDT are:

    • by-reference (i.e., non-consuming) iterators
    • RAII guards such as DB transactions

    Examples when you do not need references and lifetimes in your UDT include:

    • A newtype wrapper that validates the invariants of some other "raw" type (e.g. a String) and that you want to create and return to a caller;
    • An implementation of a data structure or collection of some sort.

And to directly answer the question in the title:

Are lifetimes in structs an anti-pattern?

No. Definitely not:

  1. They are sometimes necessary for implementing semi-advanced constructs,

  2. The parametrization of a UDT by a lifetime is so trivially detectable and explicit (purely syntactic) that it must clearly be something that the language deliberately supports and not something that is accidentally possible but bad.

    If it were an anti-pattern, then there should/would simply be no way to add lifetimes to structs, for example.

9 Likes

You'll certainly have a better time if you understand Rust's borrowing system before you attempt a lifetime-bearing struct. The main use case for a lifetime-bearing struct is some sort of short-term borrow. Say, for example, a borrowing iterator. There are other secondary use cases such as zero-copy parsing, arenas, and so on, but again, you'll probably be better of getting a feel for the borrow system first, instead of diving into the deep end.

If you have a more specific example of an attempt at a lifetime-bearing struct that caused problems, you'll probably get more concrete advice on some alternatives.

5 Likes

The reason structs with lifetimes are generally considered a beginner mistake is that Rust's references aren't the same as references in Python, Java, C#, TypeScript or other (garbage-collected) languages that beginners often know before starting in Rust.

In Rust, a reference represents permission to use data owned by someone else; the lifetime rules and borrow checker are about making sure that this permission never lives longer than the original object. This implies that you want a struct with a lifetime to be short-lived, since it's representing a permission to use someone else's owned data for as long as it lives.

In GC languages like JavaScript and Python, a reference represents shared ownership of an object. As a result, a lot of programmers are in the habit of thinking of a reference as "like ownership, but more efficient", because in a GC language, that's what it is.

And that's why it's normally seen as a beginner mistake; if you're thinking of references as representing efficient shared ownership (as in a GC language), you're going to use them in the wrong places for Rust, where they represent permission to access someone else's data. You then get into ever more complicated contortions trying to get lifetimes right, because you're trying to fit into the wrong mental model.

If you actually want to do what you did with GC references, you're looking at Rc<T> or Arc<T> from std, or using the gc crate, or even something more complicated like crossbeam_epoch. But, while the name is the same, Rust references are not GC language references, and they're also different to C++ references.

11 Likes

They are a trap.

There are valid uses for putting temporary loans in structs, but almost every new Rust user does it for wrong reasons, and creates a borrow checking nightmare for themselves.

People confuse Rust's references with storing data "by reference" – but they can't store anything, by definition! They're the exact opposite of storing data.

Or people confuse Rust's references with reference types in object-oriented languages — but they're a temporary scope-limited access permission to an object, not the object itself.

People coming from C see Rust's references as pointers. C uses pointers to avoid copying, but Rust uses references to avoid owning, which is a different thing. Rust has other types and design patterns to avoid copying.

Some users have heard that references are "faster" or somehow better, and use them where it's not appropriate. This is like hearing that addition is faster than division, and trying to divide numbers with +.

So even though references in structs are not an anti-pattern by themselves, they are almost always misunderstood by novice users, and are actually rarely needed in structs in practice.

So my advice is: don't put references in structs. You'll be right 99% of time, not very wrong 1% of the time, and that's better than impossible to compile code full of invalid <'a>.

23 Likes

You make it sound like it's two completely unrelated things. How come? Certainly, when I want to avoid unnecessarily cloning a read-only String when passing it to a function, I'll make my function accept a &str. Boom, I just used a reference to avoid cloning!

I also avoided ownership, but that wasn't the point. Passing ownership would have required making a clone (assuming I wanted to keep using the value after the function call). So they are clearly not unrelated, they are very much entangled.

1 Like

You've conflated references in struct fields with references in function arguments. They have different purposes, and their limitations hit differently in these cases.

This is part of the trap, because temporary references limited to a scope of a call are almost always the right thing in function arguments, but the same restriction is almost never appropriate in structs.

2 Likes

I have not, it was you who made a sloppy assertion using inadequately precise language. It was you who wrote:

Firs off, there's no qualification in this sentence that would narrow "references" to "references in structs".

Second, it's completely irrelevant anyway. References in structs are not different from references in function arguments. There isn't a separate kind of reference that you declare inside a struct definition and another, separate kind that you put in a function signature. It's just not a thing.

You make it sound like references in Rust are not being partly used for the same purposes as pointers in C – and that is false, because they absolutely are.

I stand by what is said. Rust's references do avoid owning.

Not taking ownership of function arguments is usually the desirable thing, not just from performance perspective, but semantically from memory management perspective. Functions usually just need to "view" arguments and may not even have a place to store them. Plus returning a reference to an argument passed by reference works fine, because the scope of the loan is larger than the function's own scope.

However, avoiding owning (storing) of the data in structs is usually highly problematic, because something has to own the data. The data has to live somewhere. If it doesn't get stored in the struct, then it's typically borrowed from a local variable. This makes the struct a temporary view of the variable. Because Rust can't make variables live longer, and doesn't allow self-referential types this is usually a dead-end for novice users who don't understand the borrowing relationship, and just wanted "not copying" without consciously choosing whether do that via moves or loans (or Cow or other pattern)

3 Likes

Rust has more than one type which is a pointer. If you want a pointer, it doesn't mean you need a temporary reference. You may need a Box, or Arc, or a type that manages the pointer itself like Vec.

C pointers are (from type system perspective) ownership-agnostic. References are not. Memory management of data behind C pointers can be designed any way you like. Rust is much stricter about it, and references have a predefined meaning.

There are also different conventions. In C APIs pointers are used to create opaque types (privacy, ABI) and uniqueness. Rust has unique ownership even "by value", has field privacy, and is rarely concerned with ABIs.

4 Likes

I'm not saying that they don't.

Here's what I'm saying:

  • References avoid copying
  • References also avoid owning, which is actually how they avoid copying.

What you are literally saying:

  • References avoid owning
  • References do not avoid copying

It is the second part that I have a problem with, because it's very obviously false in practice.

Not a bit of what followed in your essay is new to me, and you don't need to teach me how references work, how they differ from GC'd pointers in other languages, nor do you have to convince me to avoid over-using references when ownership would be the right abstraction.

My reaction was specifically only to the part where you asserted that references aren't used for avoiding cloning, only other mechanisms are. I cannot possibly agree with that under any circumstances.

Could you give (or point to) an example when they are really needed or at least appropriate to use?

Yes, and then what? This is not something that I disputed, this is completely orthogonal to the discussion. I literally wrote that references are partly used in the way C pointers are used, which is true. Rust references are non-owning pointers, and sometimes C pointers are also used as a non-owning construct.

There's nothing wrong or false in what I wrote.

By-reference iterators have been brought up twice in this thread already; when iterating over references to the elements of a collection, you usually need to store a reference to the collection itself. You don't want to own the collection, because dropping the iterator would then destroy it, meaning that the iterator couldn't hand out references to the items in the first place.

Another example I gave above is a transaction object that needs a mutable reference to a DB connection to manipulate it (commit or rollback) when the transaction object goes out of scope, since you don't want the DB connection to be destroyed together with the transaction (hence the latter can't own the former).

2 Likes

That's not what I meant to say.

When I say their purpose is to avoid owning, I mean that's the primary reason to use/not use them.

If you want not-copying, I'm not saying references will copy. I'm saying this is not an automatic and unique justification to use references. There are other choices, and you need to be aware which is right in what context. Box is also a pointer, and can also be passed around without copying the data (but it's owning and ownership dictates what you can and can't do). If you must create a String anyway, then moving it into a struct will avoid copying of the heap data. You don't, and often can't use &str instead. If you don't always create a String, you may need Cow<'static> instead.

There's a whole another axis of ownership that users must be aware of. Novice users often don't realize that, and reduce semantics of references to only "a pointer type that doesn't copy" which is technically true, but misses the other important aspect that if ignored causes explosion of <'a> everywhere. Users may need "a pointer type that doesn't copy" but one that also owns the data, and then it's not a Rust reference.

4 Likes
  • MutexGuard, Ref which behave like a reference, but need to run Drop.

  • Iterator that returns items by reference can't own the items, and it borrows from the collection it iterates.

  • Temporary indexes into types like Vec (when you run an algorithm that needs fast lookup, but don't need to keep data in a HashMap later).

  • sorting or picking top N items from multiple data sources. You can make a Vec<View<'tmp>>, sort that, and then read the data in the order you want.

  • getter methods that return references to multiple fields or objects (often done with tuples, but a struct can be used too)

The common theme here is that you create and destroy temporary objects that are only a view into some other storage location.

7 Likes

It's a long, long read :slight_smile:

This website has a lot to offer, when delving deeper in some rust constructs (after reading the according chapter in "The Book"

Cheers

3 Likes

This is exactly what I was hoping to find, thank you! Long is appropriate when trying to gain a deeper understanding of a topic :slight_smile:

Thanks to everyone who took the time to share their thoughts.
It seems there is not 100% agreement on when to use or avoid references but reading the discussion has given me (and hopefully other beginners) some good ideas on how to think about Rust's type/ownership system and how its semantics differ from other languages.

My question was originally prompted something like this. After some time I playing with my original code, I finally got an error message, telling me that I cannot return references to values that are owned by the function itself. In theory I knew that (the Book is pretty clear on that), but haven't understood the full ramifications.
Even tough I return the foods vector within the same data structure as the references to foods, that still seems to violate the that rule.

struct Food {
    description: String,
}
struct Recipe<'a> {
    foods: Vec<&'a Food>,
}
struct Config<'a> {
    foods: Vec<Food>,
    recipes: Vec<Recipe<'a>>,
}

// dummy implementation
fn create_config<'a>() -> Config<'a> {
    let foods = vec![
        Food {
            description: "Water".to_string(),
        },
        Food {
            description: "Flour".to_string(),
        },
    ];

    let bread = Recipe {
        foods: vec![&foods[0], &foods[1]],
    };

    Config {
        foods,
        recipes: vec![bread],
    }
}

The challenge is that when you have a reference to something, the referred-to item cannot be moved. But, when you return Config, you move it. The borrow checker is not sophisticated enough to understand that &foods[0] is in any way different to &foods, and thus cannot see that the heap allocated part of the Vec does not move.

Teaching the borrow checker to handle this sort of thing safely is not simple; there's plenty of people who'd like it to be possible, but to do so requires defining the semantics of such a self-reference so that the borrow checker can check it in sensible time.

If you really wanted to make your design work, you'd pass foods into the function that returns config:

struct Food {
    description: String,
}
struct Recipe<'a> {
    foods: Vec<&'a Food>,
}
struct Config<'a> {
    foods: &'a [Food],
    recipes: Vec<Recipe<'a>>,
}

fn create_foods() -> Vec<Food> {
    vec![
        Food {
            description: "Water".to_string(),
        },
        Food {
            description: "Flour".to_string(),
        },
    ]
}

fn create_config(foods: &[Food]) -> Config<'_> {
    let bread = Recipe {
        foods: vec![&foods[0], &foods[1]],
    };

    Config {
        foods,
        recipes: vec![bread],
    }
}

This then requires your caller to keep the Vec<Food> somewhere sensible, and to separately handle the Config. A more common way to resolve this problem in Rust is to use indexes:

struct Food {
    description: String,
}
struct Recipe {
    foods: Vec<usize>,
}
struct RecipeView<'a> {
    foods: Vec<&'a Food>
}
struct Config {
    foods: Vec<Food>,
    recipes: Vec<Recipe>,
}
fn create_config() -> Config {
    let foods = vec![
        Food {
            description: "Water".to_string(),
        },
        Food {
            description: "Flour".to_string(),
        },
    ];
    let bread = Recipe { foods: vec![0, 1] };
    Config {
        foods,
        recipes: vec![bread]
    }
}
impl Config {
    fn get_recipe(&self, idx: usize) -> Option<RecipeView<'_>> {
        let recipe = self.recipes.get(idx)?;
        let foods = recipe.foods.iter().map(|idx| &self.foods[*idx]).collect();
        Some(RecipeView { foods })
    }
}