Lifetime & borrowing resources for an intermediate

I hope this is not too well trodden ground, but I couldn't find another question with the same intent so I figured I'd open it up anyway. I'm looking for resources on deepening my understanding of life times and how to work with the borrow checker.

Some context: I'd call myself a Rust developer on the shy end of intermediate. I'm reasonably efficient in rust and have actually contributed to a few projects. I understand the idea behind the borrow checker and usually don't spend to long fighting it, I know what a lifetime annotator is and how to add one. However, my code is still filled with what feel like unnecessary .clone()s and uses Vec instead of &[] basically everywhere and that's something I'd like to work on.

Topics I'm mostly looking for help with:

  1. When is the work of borrowing instead of cloning likely to be worth the effort?
  2. when is borrowing instead of cloning probably going to work and when isn't it?
  3. Are there code pasterns that help make code more borrow friendly?

Links, talks, blog posts, tutorials, explanations, everything is welcome! Thanks everyone!

Let's start with a bit of general, abstract, perhaps even philosophical observations:

  1. The life times and the borrow checker are tools in your arsenal as a programmer, just like Rust is a toolkit, which combines these tools with a whole bunch of other wonderful features to make your life easier. They are not there to bully you into developing an inferiority complex as a programmer, they are there to assist you. You are not supposed to work with borrow checker and life times as much as they are supposed to work for you. If the compiler "screams" at you, it's because you've made a mistake - not because you haven't learnt the art of "working" with it well enough yet.

  2. Rust relies on your understanding and appreciation of the underlying implementation of your code, at least the most basic level. I'm not here to promote my own writing style, but I do wish someone explained to me a bit ealier what are the essential basics that any kind of language takes care of - implicitly or explicitly - which I had to slowly grasp myself, so if you didn't have a chance to scroll past it yet, take another look at this, perhaps you'll find it useful as well.

  3. Although merely an extension of the previous point, I feel the need to emphasize this particular part separartely. The concepts of ownership and borrowing are powerful abstractions, but make sure you don't get lost in them, trying to solve the problem that doesn't exist.

    To "own" a piece of data, in Rust's world, means to "be responsible for keeping track of its usage", freeing the underlying memory when it's no longer needed (as defined by the scope of the task at hand). To "borrow" means either: to "have an exclusive right to change it however you like" in the case of an exclusive / mutable reference, or to "take a look at the data without changing it" in the case of shareable one. That's it. It's not that complicated. Don't make it more difficult than it is.

    The work of life times and the borrow checker boils down to tracking down these responsibilities and references, and making sure they make sense. No looking at the data after it's been dropped once the variable that you've declared goes out of scope. No mutating at the same time from several places at once. No looking at the data that is currently being used with an exclusive mutable right, because by the time you may decide to look at it, the data in question might have already been re-allocated somewhere else. Period. That's literally all you need to know.

    Keep these things in mind and figure out where your data flow introduces hidden problems the next time you have an unpleasant DoC (denial of compilation) attack by the compiler.

With that out of the way, let's tackle your more concrete questions:

  1. You can borrow, avoiding cloning, when you know for sure that the data you're referencing will be valid across the scope of whatever is borrowing the data in question. Taking a reference to a data you've just declared to process it in a separate function that returns right away is valid. Sending this reference over to another thread, the scope of which can grow much bigger than the scope in which your data was declared, is not. This is valid for both shared and mutable references.

    If you can't know that for sure, you'll need to either clone (allocate a new region of memory and fill it with the same data) or to move your data into the new scope (make the new scope responsible for freeing your data's memory afterwards), where it can be processed, as needed.

    You can also allocate your data on the heap (a separate region of memory, not linked to any particular scope or function) via an Rc or an Arc - and clone only the reference to your data, avoiding the clone procedure altogether. If you need to mutate it at the same time, you'll likely need a Mutex as well, to make sure several operations over it don't mess up with each other.

    A bit less known concept is the leak of the data from the heap, which gives you a reference to the data you've placed there earlier, which can span for as long as you need it to - including the 'static' lifetime (which will make its usage valid for the whole program). That's useful, for instance, when you have some shared configuration you want to make accessible across your program in different places at the same time. You'll likely find a once_cell a bit more useful for that, though - and it's going to be a bit easier to reason about. Still, know your options.

  2. There shouldn't be any "probably" in your mind by this point, when it comes to this question.
    Do you have to "take a look" at the data or get "an exclusive mutable right" to do something with it across the region that is smaller than the scope of your data itself? Just use your regular and mutable references. Do you need to send this data to several other threads for processing, knowing that they're not going to rely on results' of each others' work? Just clone and move it.

    Do you need to access the same data over and over again, while processing it for arbitrarily long ranges of time? Allocating it on the heap will do the trick. Do you also need to mutate it, while processing it from several places at the same time? Wrap it in the Mutex, wrapped in an Arc for thread sharing or an Rc for strictly synchronous, one-threaded execution. That's it, really.

  3. If by "borrow friendly" you mean "less allocation heavy", then once again - use the heap with Rc / Arc a bit more often. It's slightly less performant in terms of access, but will save you from using the clone all over the place. If your functions can avoid consuming and throwing away your data, effectively becoming a "one-way ticket" for it, make sure to only accept references to it (& or &mut, depending on what you need to do with it). This same applies for Vec<T> vs &[T] as well - if you can process any arbitrary slice of data, restricting your function's argument to a Vec makes no sense. In general, avoid refer to the types, owning the data, if you can refer to the references behind the ownership instead. This thread mentioned many other useful tips, along with that.

Another useful link: Common Rust Lifetime Misconceptions

5 Likes

This may be the curse of knowledge talking, but the vast majority of the time you shouldn't need to be cloning data. Just pass things into functions by reference instead of value, and try to write code where you move a value instead of needing two copies.

You can also use things like impl Trait to lazily return references instead of owned collections.

For example, say you were wanting to find all items in a list of strings which match a particular pattern, instead of writing something like this:

struct MyType {
  strings: Vec<String>,
}

impl MyType {
  fn matches(&self, pattern: &str) -> Vec<String> {
    let pattern = Regex::new(pattern).unwrap();
    
    let mut matches = Vec::new();

   for string in &self.strings {
    if pattern.matches(string) {
      matches.push(string.clone());
    }
   }

    matches
  }
}

You might write something like this:

impl MyType {
    fn matches<'a>(&'a self, pattern: &str) -> impl Iterator<Item = &'a str> + 'a {
        let pattern = Regex::new(pattern).unwrap();
        self.strings
            .iter()
            .filter(move |s| pattern.is_match(s))
            .map(|s| s.as_str())
    }
}

An exercise that might be useful is taking one of these places where you feel a Vec or clone() is unnecessary and consciously trying to get rid of it in a way that doesn't impact readability/maintainability.

After doing this a couple of times you'll start seeing patterns and know when copies can be avoided. Or more importantly, you might find places where a copy is actually needed, and trying to use references will take you down a long painful path that doesn't go anywhere.

Once you can look at a situation and say "I don't want to use a lifetime annotation/reference here because I'll end up fighting the borrow checker 2 or 3 steps down the road" you can confidently say you have mastered Rust's lifetime rules.

Of course, feel free to ask for help on these forums if you feel like you are banging your head against a wall without making progress. Troubleshooting those sorts of frustrating lifetime issues are what really help you understand the system, and often people will be able to nudge you in the right direction ("oh, you need to add a 'a here as well", or "well if you restructure your code to look like X, we won't get into this situation in the first place").

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.