What is the exact region of a borrowing?

Consider such two contrast examples:

fn get_value<'a>(list:&'a mut Vec<String>)->&'a mut String{
    if let Some(s) = list.first_mut(){  // #1
        return s;
    }
    list.push("hello".to_string());  // #2
    list.first_mut().unwrap()
}
fn get_value<'a>(list:&'a mut Vec<String>)->&'a mut String{
    if !list.is_empty(){
        let s = list.first_mut().unwrap();  // #3
        return s;
    }
    list.push("hello".to_string());  // #4
    list.first_mut().unwrap()
}

The first one won't be compiled while the second one is ok. In such two examples, both s should have the lifetime 'a. Don't they include #2 and #4, respectively? Why #1 conflict with the action at #2, however, #3 does not conflict with #4? Desugar them, we can get the following pseudo-code:

// Case 1:
'a:{
   let s:&'a mut String = XXX; // ('a, mut, *list) caused by reborrow
   list.push("hello".to_string()); // P1, & mut *list , deep write
   ...
}

// Case 2: 
'a: {
   {  // block of if 
      let s:&'a mut String = XXX; // ('a, mut, *list) caused by reborrow
   }
   list.push("hello".to_string()); // P2, & mut *list , deep write
   ...
}

In both cases, region 'a includes P1 and P2. According to 2094-nll - The Rust RFC Book, if the regions of the loans include the P1 and P2, they will conflict with the action at P1 and P2, respectively. However, the result of the second case seems to say the region of the loan does not include P2, I suppose the key point why case 1 is an error but case 2 is ok might rely on this rule

For a statement at point P in the graph, we define the "transfer function" -- that is, which loans it brings into or out of scope -- as follows:

  • any loans whose region does not include P are killed;

So, I wonder why the region of the loan at #3 does not include #4? How to correctly consider the region of a loan? It seems that the "region of a loan" is not determined by the lifetime value? I'm not sure.

A point upfront: The first example should compile, but doesn't in the current compiler (nll problem case 3, conditional return of a borrow). It only doesn't because the analysis was too costly with the current borrow checker. The plan is still to allow it.

Next, recall that lifetimes are control flow and code point sensitive. There is no single lexical set of lines for a lifetime necessarily.

In the examples, there's no way for s to reach lines 2 or 4, so the borrow can be allowed. The lifetime not including them at the code point of the borrow is one way to think of it. A function need not be exited at the bottom is another way.

The nll rfc has a section about this case specifically where it talks about end regions.

https://rust-lang.github.io/rfcs/2094-nll.html#layer-4-named-lifetimes

(Sorry for the brevity; on mobile.)

4 Likes

However, the #2 in the first example will cause the error. Could you please interpret what the region of a loan is? How does it rely on control flow?

The lifetime not including them at the code point of the borrow is one way to think of it.

IIUC, in both case, 'a does alive at #2 and #4. Hence, 'a does include that two code points of the borrow.

See here

So, then the skolemized region 'r would be {START,SOME,NONE,END,End('r)}, more or less, and hence 'b would wind up as {START,SOME,END,End('r)}. The key point is that 'b (the borrow) still doesn't have to include NONE.

You know r is active before entering the function body, and after, but b can outlive r (be valid everywhere r is from here on) and still not cover the entire function body due to control flow / the compiler being able to prove b can't reach NONE.

Another borrow c in START that could reach NONE would instead be a borrow error.

Incidentally reviewing the problem case

https://rust-lang.github.io/rfcs/2094-nll.html#problem-case-3-conditional-control-flow-across-functions

Control flow was considered even pre-nll, just not to the extent desired.

1 Like

Sorry, I don't understand how the point in your link issue interpret my examples. Is there a simple way to construe the region of a loan?

There are some projects which try to visualize them, etc. But keep in mind that we are at Rice's theorem mercy: there would always be valid programs rejected by the compiler or invalid programs accepted by compiler.

Rust's decision was to do a split:

  1. Safe Rust picks the option “reject some valid constructs” (and make user of the compiler resposible for fixing them and making them compileable).
  2. Unsafe Rust picks the option “accept some invalid constructs” (and make user of the compiler resposible for fixing them in the absence of any help from the compiler).

And, well… the situation where correct program is rejected… it's immensely frustrating.

That's why compiler rules are regularly relaxed but with the idea to keep them still in these bounds: invalid programs are all rejected, [some] valid programs are rejected, too.

2 Likes

I don't think I'd call borrow checking simple, no.

As for how it relates, your first example is something like

START
|
('b: 'r)
goto NONE or SOME -----+
     |                 |
   NONE              SOME
   (mutate list)     |
   ('a: 'r)          END (returns 'b)
   |            
   END     
   (returns 'c)

Which is, well, pretty much exactly the example in Niko's comment.

The second example is similar except 'b starts in the if body that takes the place of SOME.

Perhaps think of it like so:

      | 'r
      |
+-----+-----+\
|    body   | \
|     of    |  } 'r doesn't really care what goes on inside the function
|  function | /     so long as it is still valid after any return
+--+--+--+--+/     
   |__|__|   
      |
      | 'r

If you return a 'b from the body of the function, you have to have 'b: 'r -- it has to be valid after the function ends. But it need not cover every code point in ever flow through the function. It only needs to cover the code points reachable between its creation and being returned out of the function.

2 Likes

My understanding is, in order to determine the region of a loan, we first calculate the maximum region the lifetime in the loan could represent, which is called region R, then exclude these regions for which the control flow won't reach if the control flow once undergoes the creation of the loan, the resulting un-continuous region is the region of a loan. For example:

// ----------------------------------------------------- 'a
fn show<'a>(list:&'a mut Vec<String>){
      'd:{
          if true{ // ---------------------------------------- Control flow goes into here
             let s:&'a mut String =  list.first_mut().unwrap();  // Creation
             break 'd; //  ------------------------------------- Control flow jumps to the end, which skips `P`
         }
         list.push("hello".to_owned());  // P
     } // ---------- end
}
// ----------------------------------------------------- 'a

First, the lifetime 'a can cover all the function body, i.e. it is valid everywhere in the function body, I want to say it the region of the borrowing with lifetime 'a can potentially include the point P, however, the control flow that goes into the creation of the borrowing won't flow the point P anymore, hence the point P will be excluded from the region of the borrowing.

I think the first example can also be interpreted in this way.

// ------------------------------------------------------------ 'a
fn get_value<'a>(list:&'a mut Vec<String>)->&'a mut String{
    if let Some(s) = list.first_mut()  //control flow `C` flows this point, which creates ('a, mut, *list), 
    { 
        return s;
    }
    list.push("hello".to_string());  // P, `C` can reach here if list.first_mut() return None
    list.first_mut().unwrap()
}
// ------------------------------------------------------------ 'a

In this example, as well, 'a covers everywhere in the function body, moreover, the control flow that creates the borrowing can also undergo P if list.first_mut() return None, hence the region of ('a, mut, *list) includes point P

Is this a correct understanding?

I find it a bit easier to reason about if, despite the explicit annotations, you consider reborrows within the function to be distinct from the input lifetimes. For example,

let s: &'a mut String = list.first_mut().unwrap();

For the sake of reasoning and discussion, let's call this 'a1 instead of 'a. Due to being returned it must satisfy 'a1: 'a and due to being a reborrow it must satisfy 'a: 'a1, so it "is" 'a, [1] but again I find it easier to talk and reason about it if we keep the name distinct.

For example, we can talk about the region of 'a1 distinctly from the region of 'a. The region of 'a1 doesn't include P, and thus the push causes no error. However the region of 'a must include P, or the push would be an error.

When I say lifetimes are control-flow and code-point specific, this is the sort of thing I'm referring to. The way the RFC puts it is

However, in order to successfully type the full range of examples that we would like, we have to go a bit further than just changing lifetimes to a portion of the control-flow graph. We also have to take location into account when doing subtyping checks. This is in contrast to how the compiler works today, where subtyping relations are "absolute". That is, in the current compiler, the type &'a () is a subtype of the type &'b () whenever 'a outlives 'b ('a: 'b), which means that 'a corresponds to a bigger portion of the function. Under this proposal, subtyping can instead be established at a particular point P. In that case, the lifetime 'a must only outlive those portions of 'b that are reachable from P.

(And this is why I wouldn't say there's simple definition of a region.)

I'm not sure if you're arguing that the current error is correct and this is why?

(All my responses in this thread are based around that pattern being sound and eventually accepted by the compiler, and following how the lifetimes work if that could be stabilized, as the RFC originally intended.)


  1. and you even explicitly annotated it as such ↩︎

1 Like

I know your meaning, however, such two conclusions are a bit contradictory. We say 'a includes P, we also say 'a:'a1 and 'a1: 'a, this means, 'a1 is 'a, however, we instead say 'a1 does not include P but 'a does. The symbol : is weak to express the relationship. Could we say 'a1 is identical to 'a except for excluding P?

I'm not sure if you're arguing that the current error is correct and this is why?

I'm trying to understand why the first example is an error in the current model. I find that the region of a loan is not only specified by the region represented by its lifetime but also should exclude these points/regions that would be flowed by the corresponding mutually excluded control flow, which means:

// ----  'a
if b { 
  // control flow C1
  let b:&'a1 mut () = XXX; // 'a: 'a1 and 'a1: 'a
}else{
// control flow C2
}
// ---- 'a

C1 and C2 are mutually exclusive. In this example, 'a1 does not include the region in else

For the show example it's identical except for P and everything before the creation of the borrow. In general as you've seen I say things like "control-flow and code-point specific".

Yeah, ok. Then what you said seems reasonable to me. But given that wasn't supposed to be an error in the RFC that provides the best documentation we have, I doubt exactly what's going on has been formally written down anywhere.

@xmh0511 The reason why the first example is an error in the current model is quite simple, and has already basically been asserted:

The current borrow checker is not flow-sensitive. That means that relations like 'a: 'a1 and 'a1: 'a hold "globally", and imply that 'a1 and 'a are the same region. Therefore since 'a1 includes P (to make the return valid), 'a also contains P and we get a double-mut error.

The way which has been proposed to solve this issue is to make region containment flow-sensitive. More precisely, that means that regions start out unconstrained but as you move forward along the control flow relations between regions can be asserted (and must hold from then on), and so two regions can be equal in one part of the control flow and not equal elsewhere.

With the example, we get the following:

fn get_value<'a>(list:&'a mut Vec<String>) -> &'a mut String {
    if let Some(s) = (&'b mut list).first_mut() { // A
        return s; // B
    }
    (&'c mut list).push("hello".to_string());  // C
    (&'d mut list).first_mut().unwrap() // D
}

I've annotated the regions created in this program. We have the following inferences:

  • 'a: 'b at A, 'a: 'c at C, 'a: 'd at D because it's a reborrow
  • 'b: 'a at point B because it must be assignable to the return
  • 'd: 'a at point D because it must be assignable to the return
  • The control flow is A -> B -> exit, A -> C -> D -> exit
  • 'a is live for the whole function because it is a parameter
  • 'b is live at A, 'c is live at C, 'd is live at D because that is where they are created
  • If a lifetime is live at one point it is also live anywhere between than that point and the point of creation, although this has no effect on this example
  • If 'x: 'y and 'y is live then 'x is also live; from this we deduce that 'b is live at B
  • 'b, 'c, 'd must be disjoint because they are sibling mut borrows

From this, we get a flow-dependent picture of the region inequalities:

  • A: 'a: 'b, 'a live, 'b live
  • B: 'a = 'b, 'a live, 'b live
  • C: 'a: 'b, 'a: 'c, 'a live, 'c live
  • D: 'a: 'b, 'a: 'c, 'a = 'd, 'a live, 'd live

And now we ask: is it ever the case that two borrows that must be disjoint are also simultaneously live? And the answer is no. The main point of concern is C, but we can see that only 'c is live there.

The flow-insensitive analysis is equivalent to taking the above picture and merging all the region inequalities without regard to where they are asserted. So we get 'a = 'b, 'a: 'c, 'a = 'd, and we can use this to infer that 'b is live at point C because 'a is, and hence 'b and 'c are both live at the same time, error.

1 Like

For the region inequalities

  • C: 'a: 'b, 'a: 'c, 'a live, 'c live

I don't understand why 'b should satisfy 'a: 'b at Point C. The same issue is in point D.

Once a region inequality is established, it has to hold "forevermore" forward along control flow. It's kind of a weird concept that set containment would be time-dependent in this way, but the idea is that once we learn a fact about how regions relate, that fact cannot later be contravened, although it may become irrelevant (which is the case here: at point C 'a: 'b is still true, this is just a structural fact about how 'b was reborrowed from 'a, but 'b is dead so it doesn't matter anymore).

If it were that, how do you interpret the NLL example?

fn show<'a>(v:&'a mut i32){
  let mrf: &/*'b*/ mut i32 = & /*'b*/ mut *v;  // A
  let mrf2:&/*'c*/ mut i32 = & /*'c*/ mut *v; // B
}

As you said, once the borrow with lifetime 'b is created, it has to hold "forevermore" forward along control flow. that means &'b mut i32 must be alive at point B, which would be an error, however, this example is ok.

It is not that the borrow continues to hold, the region inequality continues to hold. That is, if we think about the region as a set of source lines, then the source lines for 'b is a subset of the lines for 'a, and this is true no matter from which "perspective" you look at the code. In your example both 'b and 'c are outlived by 'a, and this is true at all program points.

The operative question is whether two sibling borrows are both live simultaneously, and the region inequalities are just used to infer liveness. In this example both borrows are only live for their own line ('b is only live at A and 'c is only live at B), so there is no conflict. To make that example an error, you could for example return mrf from the function; this would cause 'b to be live at point B and cause a double-mut error.

In your example, Do you mean

  • C: 'a: 'b, 'a: 'c, 'a live, 'c live

The reason why we get the conclusion 'a:'b at point C is because of B: 'a = 'b, 'a live, 'b live ?

We have 'a: 'b at point C because we had 'a: 'b at point A and C follows A. (It is relevant here that the borrow that made 'b happens before the branch. If there was a borrow made properly inside the if statement, say (&'e mut list).push(...) at point B, then there would be 'a: 'e at B but 'a vs 'e would be unconstrained at point C.)

We have 'a: 'b at point C because we had 'a: 'b at point A and C follows A.

I don't think we can conclude this inference without the conclusion B: 'a = 'b, 'a live, 'b live, if we do not have the conclusion B: 'a = 'b, 'a live, 'b live, this example is just like the above example

fn show<'a>(v:&'a mut i32){
  let mrf: &/*'b*/ mut i32 = & /*'b*/ mut *v;  // A
  let mrf2:&/*'c*/ mut i32 = & /*'c*/ mut *v; // C
}

A: 'a:'b, 'a live, 'b live
C: 'a : 'c, 'a live, 'c live

If we do not have another conclusion like B: 'a = 'b, 'a live, 'b live, they are two identical examples.

It's more like this:

  • A: 'a: 'b, 'a live, 'b live
  • C: 'a: 'b, 'a: 'c, 'a live, 'c live

At point C we do have 'a: 'b, but it's not an error because 'b is dead. The error condition would be if both 'b and 'c were simultaneously live at some program point.

It is important to realize that this flow sensitivity is not required for the vast majority of Rust programs, and in particular NLL as currently implemented does not do flow sensitive region containments. So it would instead just infer something like:

  • 'a: 'b, 'a: 'c (location independent facts)
  • A: 'a live, 'b live
  • C: 'a live, 'c live

This overestimates the region containments (e.g. 'a: 'c at point A), but we still get to the same conclusion - 'b and 'c are not both live at any program point.