How to Understand NLL vs. Polonius Borrow Checking? (with example)

Hello, I am a beginner trying to understand the difference between stable borrow checking and Polonius. Why does NLL reject this code, but Polonius allow it? I would also appreciate feedback about style issues in the code itself; likely I am doing many things wrong.

Briefly, NLL does not "realize" that a particular borrow is never used again after my loop scope ends, but Polonius does. Maybe using break to end this loop is the problem? But I really don't understand why: my code appears similar to Walking Down a Data Structure (Problem Case 4) of the NLL RFC, so I expected it to be admitted by the current borrow checker. I did search for visualization tools for borrow checking, but could not find anything easy to use. I tried to find something that would call promising-sounding functions like explain_borrow from rustc_borrowck and build a graph, but couldn't. Any references appreciated.

Anyway, I am doing the Stanford Linked Lists problems with the safe types from Rust With Too Many Linked Lists. I'm working on "insert in sorted order," and I know this solution is wrong --- but it still summarizes my confusion about borrow checking, so I'd like some help with that before I fix all the edge cases. Here are the types:

pub struct List {
    head: Link,
}

type Link = Option<Box<Node>>;

struct Node {
    elem: i32,
    next: Link,
}

Here is the implementation that NLL allows:

    pub fn sorted_insert_nll(&mut self, data: i32){
        let mut cur_link = &mut self.head;

        while let Some(node) = cur_link {
            // "peek" at next item: rather do it with a reference?
            let next_item = node.next.as_ref().unwrap().elem;
            if data >= node.elem && data <= next_item {
                println!("Place new element between {} and {}",
                         node.elem, next_item);
                let new_node = Box::new(Node {
                    elem: data,
                    next: node.next.take(),
                });
                node.next = Some(new_node);
                break;
            }
            cur_link = &mut node.next;
        }
    }

Here is the implementation that NLL rejects, but Polonius accepts. As far as I can tell the only difference is moving a section use of cur_link outside the loop that "walks" down the data structure.

    pub fn sorted_insert_pol(&mut self, data: i32){
        let mut cur_link = &mut self.head;

        while let Some(node) = cur_link {
            // "peek" at next item
            let next_item = node.next.as_ref().unwrap().elem;
            if data >= node.elem && data <= next_item {
                println!("Place new element between {} and {}",
                         node.elem, next_item);
                break;
            }
            cur_link = &mut node.next;
        }

        if let Some(node) = cur_link {
            let new_node = Box::new(Node {
                elem: data,
                next: node.next.take(),
            });
            node.next = Some(new_node);
        }

    }

If I run:

cargo test -- --show-output

I get borrow checking errors in the _pol implementation:

error[E0503]: cannot use `*cur_link` because it was mutably borrowed
  --> src/question.rs:54:16

error[E0499]: cannot borrow `cur_link.0` as mutable more than once at a time
  --> src/question.rs:54:21

If I run:

RUSTFLAGS="-Z polonius" cargo test -- --show-output

both sorted inserts compile, my tests run, and they behave as expected (incorrect, but working for non-edge-cases). Please help me improve my mental model of the current borrow checker vs. Polonius vs. the "platonic ideal" of borrow checking.

1. A not so simple study of a simple case

So, for starters let's study the following simplified snippet:

// init
let mut r: &'_ mut i32 = &mut 42;

// the key statement!
let r2: &'_ mut i32 = &mut *r;

// and now let's start branching
if some_condition() {
    r = r2;
} else {
    stuff(r);
}
Click to see why

Indeed, it all stems from the let r2 = &mut *r; statement, and what the '_ in it is:

  1. does it match that of r (as in, was it a maximal reborrow / a move);

  2. or can it be smaller than r, so that r may be used afterwards?

And it turns out that on the branch leading to r = r2, for that assignment to be valid, the lifetime of r2 in that case has to be equal to that in r, so we are in case 1.

But for stuff(r); to pass the borrow checker, r2 must have ended in that branch, so we must be in case .2 in that case.

And it turns out that, even though NLL does have the notion of flow/branch-dependent lifetimes,

  • Click to see

    Consider:

    let (s1, s2) = (String::from("..."), String::from("..."));
    let (r1, r2) = (&s1, &s2);
    if some_condition() {
        stuff(r1);
        drop(s1);
        stuff(r2);
        drop(s2);
    } else {
        stuff(r2);
        drop(s2);
        stuff(r1);
        drop(s1);
    }
    

with NLL, the regions over which a lifetime spans, they encompass all branches. It is as if a single lifetime parameter "encodes" the regions in all its branches ("the regions of each branch are 'type-unified' under the lifetime parameter appearing in the type").

When doing the right-to-left r = r2 assignment, the lifetime inside r2 must be : w.r.t. the one inside r (⊇, i.e., "at least as big as") .

This means that NLL ends up considering that (the lifetime inside) r2 must match (that inside) r in all the branches, so any time r is used, even if in another branch, it is as if r2 was used too / r2 had to be live too, and since r2 borrows r it means that it is as if r were borrowed even in other branches, hence the complaint about the stuff(r); line.

A way to illustrate this is by coloring in red the property "the lifetime of r2 must be equal to that of r":

  1. image

  2. But the lifetime inside r2 stems from its definition, when borrowing r:

    image

  3. We know the lifetime '_ of that r2 declaration, so we can propagate it forwards:

    image

  4. And so we have "the lifetime of r2 must be equal to that of r" 'reaching' the stuff(r); statement which requires the opposite. Hence the error.

For Polonius, I'm not 100% sure of how it works, but I'd say that rather than having branched regions inside lifetimes (NLL: lifetimes as graphs), and thus having branches be part of lifetimes, with Polonius lifetimes are just "flows" / lines alongside these graphs (lifetimes as paths), and we deal with branches outside of lifetime parameters, outside of the type system (inside the borrow-checker itself).

We can have kind of an intuition of the difference with Polonius here (at least it's the one I have):

  • rather than saying "the lifetime of r2 must contain that of r in all branches", it is able to express the more fine-grained "the lifetime of r2 must contain that of r in this branch onwards".

  • it is as if Polonius considered all the possible flows/code-paths, and separated them, getting rid of the branches:

    1. And that's it, the "'r2 = 'r" infection is contained within the branch where it is relevant:

      • In "Code Path 1", the reborrow is maximal / "it is a move";

      • In "Code Path 2", it is a genuine (non-maximal) reborrow.


2. A tangent: generic lifetime parameters

This is basically what this slide (https://nikomatsakis.github.io/rust-belt-rust-2019/#84) from Niko's user-friendly presentation about Polonius says:

  • similar to these "multi-branch lifetimes" and the overly general r = r2 equality in lifetimes, there is the case of generic lifetime( parameter)s.

Indeed, an outer / a generic lifetime parameter is treated, by NLL, as a graph spanning through all branches until reaching the return:

fn example<'r>(r: &'r mut i32)
  -> &'r mut i32
{
    let r2: &mut i32 = r;
    if some_condition() {
        r2
    } else {
        r
    }
}

yields:

error[E0499]: cannot borrow `*r` as mutable more than once at a time
  --> src/lib.rs:9:9
   |
2  |   fn example<'r>(r: &'r mut i32)
   |              -- lifetime `'r` defined here
...
5  |       let r2: &mut i32 = r;
   |                          - first mutable borrow occurs here
6  | /     if some_condition() {
7  | |         r2
8  | |     } else {
9  | |         r
   | |         ^ second mutable borrow occurs here
10 | |     }
   | |_____- returning this value requires that `*r` is borrowed for `'r`

image

Indeed if we consider what the lifetime of the borrow inside r2 is, it must contain that outer / generic lifetime parameter 'r, which is viewed as a special "through-all-branches exit path". Given that r2 is returned so that its own lifetime has to match this special one, it means the type of r2 is thus one involving this "through-all-branches exit path" region, so starting from the let r2 node, its lifetime spreads across all subsequent nodes, including the return r; one, so that r is considered borrowed at that point.


3. Your case

is thus similar to 1.: when you do cur_link = &mut node.next;, cur_link plays the role of r, and node that of r2.

The NLL-friendly code works because it never refers to r / cur_link again, all it does is use the already-in-scope node (r2 in my examples).

  • with one interesting technicality: it is not an if let but a while let, and yet it still works. How is that?

    The only way to re-run into another loop iteration and thus another attempt to borrow cur_link (r) is if we do not break off the loop. And in the branch where we do not break it, even if cur_link has been "maximally borrowed / moved from", we end up populating it with a new cur_link / we move another value back in (when doing cur_link = &mut node.next;).

    let mut r = &mut 42;
    let r2 = r; // moves out of `r` due to the `r = r2` below yadda yadda.
    if some_condition() {
        r = r2; // however, this line, itself, is moving something (`r2`) back into `r`.
        stuff(r); // so this is fine
    } else {
        // but in this branch, `r` is still moved out and does not witness the "move back in"
        // so we cannot `stuff(r);`
        diverge!(); // in your code, this was a `break`.
    }
    // here, either `r` has been moved back in, or we diverge and never reach this.
    stuff(r); // so this is fine
    

    It is actually quite ironic that the reason we "move back in" so as to get a fully usable value is what triggers this NLL bug/limitation that makes it be unusable in other branches :smile:

The non-NLL-friendly one does not use that "maximally borrowing node", but instead tries to reöbtain a new node binding by borrowing from cur_link again, but cur_link has been, due to the aforementioned limitations, been maximally borrowed.

  • Back to my simplifying example, it is as if we replaced diverge!(); with something non-diverging (since the break only breaks out of the first loop, so we still reach the second borrow attempt). The stuff(r) (attempt to borrow cur_link) outside the branches (of the first loop) is thus conservatively denied since Rust worries about the "moved out"/maximally borrowed r.
8 Likes