Lifetimes annotation

rusty-kj · April 18, 2024, 12:19pm

Hello all,
I try to understand how the compiler deals with lifetimes annotations. For that I try to understand 2 problems.

Problem 1: Let's consider following functions

fn foo() {
    let _x: i32 = 123;
    let _y: &i32 = &_x;
}

fn bar<'a>() {
    let _x: i32 = 123;
    let _y: &'a i32 = &_x;
}

the reference _y has implicite lifetime in foo, so I tried to express it explicite in bar, however it doesn't compile
because "_x not live long enough". Trying to understand why borrow checker complies I ended up with following MIR outputs:

// MIR for `foo` 0 renumber

| User Type Annotations
| 0: user_ty: Canonical { value: Ty(i32), max_universe: U0, variables: [], defining_opaque_types: [] }, span: src/main.rs:41:13: 41:16, inferred_ty: i32
| 1: user_ty: Canonical { value: Ty(i32), max_universe: U0, variables: [], defining_opaque_types: [] }, span: src/main.rs:41:13: 41:16, inferred_ty: i32
| 2: user_ty: Canonical { value: Ty(&i32), max_universe: U0, variables: [CanonicalVarInfo { kind: Region(U0) }], defining_opaque_types: [] }, span: src/main.rs:42:13: 42:17, inferred_ty: &i32
| 3: user_ty: Canonical { value: Ty(&i32), max_universe: U0, variables: [CanonicalVarInfo { kind: Region(U0) }], defining_opaque_types: [] }, span: src/main.rs:42:13: 42:17, inferred_ty: &i32
|
fn foo() -> () {
    let mut _0: ();
    let _1: i32 as UserTypeProjection { base: UserType(0), projs: [] };
    let _3: &i32;
    scope 1 {
        debug _x => _1;
        let _2: &i32 as UserTypeProjection { base: UserType(2), projs: [] };
        scope 2 {
            debug _y => _2;
        }
    }

    bb0: {
        StorageLive(_1);
        _1 = const 123_i32;
        FakeRead(ForLet(None), _1);
        AscribeUserType(_1, o, UserTypeProjection { base: UserType(1), projs: [] });
        StorageLive(_2);
        StorageLive(_3);
        _3 = &_1;
        _2 = &(*_3);
        FakeRead(ForLet(None), _2);
        AscribeUserType(_2, o, UserTypeProjection { base: UserType(3), projs: [] });
        StorageDead(_3);
        _0 = const ();
        StorageDead(_2);
        StorageDead(_1);
        return;
    }
}

// MIR for `bar` 0 renumber

| User Type Annotations
| 0: user_ty: Canonical { value: Ty(i32), max_universe: U0, variables: [], defining_opaque_types: [] }, span: src/main.rs:46:13: 46:16, inferred_ty: i32
| 1: user_ty: Canonical { value: Ty(i32), max_universe: U0, variables: [], defining_opaque_types: [] }, span: src/main.rs:46:13: 46:16, inferred_ty: i32
| 2: user_ty: Canonical { value: Ty(&'a i32), max_universe: U0, variables: [], defining_opaque_types: [] }, span: src/main.rs:47:13: 47:20, inferred_ty: &i32
| 3: user_ty: Canonical { value: Ty(&'a i32), max_universe: U0, variables: [], defining_opaque_types: [] }, span: src/main.rs:47:13: 47:20, inferred_ty: &i32
|
fn bar() -> () {
    let mut _0: ();
    let _1: i32 as UserTypeProjection { base: UserType(0), projs: [] };
    let _3: &i32;
    scope 1 {
        debug _x => _1;
        let _2: &i32 as UserTypeProjection { base: UserType(2), projs: [] };
        scope 2 {
            debug _y => _2;
        }
    }

    bb0: {
        StorageLive(_1);
        _1 = const 123_i32;
        FakeRead(ForLet(None), _1);
        AscribeUserType(_1, o, UserTypeProjection { base: UserType(1), projs: [] });
        StorageLive(_2);
        StorageLive(_3);
        _3 = &_1;
        _2 = &(*_3);
        FakeRead(ForLet(None), _2);
        AscribeUserType(_2, o, UserTypeProjection { base: UserType(3), projs: [] });
        StorageDead(_3);
        _0 = const ();
        StorageDead(_2);
        StorageDead(_1);
        return;
    }
}

The bb0 blocks are the same for foo and bar the only difference is in types 3 and 4, where Ty(&'a i32) causes compilation error. Can someone explain or point to explanation for this behaviour?

Problem 2. Let's take the rust book lifetime example:

a)

fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
    if x.len() > y.len() {
        x
    } else {
        y
    }
}

on the other hand we can write this in following way

b)

fn longest<'a, 'b, 'c>(x: &'a str, y: &'b str) -> &'c str
where: 'a: 'c, 'b: 'c
{
    if x.len() > y.len() {
        x
    } else {
        y
    }
}

which means that 'a-lifetime and 'b-lifetime must outlive the 'c-lifetime. Here we have freedom of choosing a and b lifetimes, they are independent. When we assumed that 'a = 'b that implies 'c = 'a and we end up in 2a example. Why compiler narrow down the lifetimes in such way? In MIR

// MIR for `longest` 0 renumber

fn longest(_1: &str, _2: &str) -> &str {
    debug x => _1;
    debug y => _2;
    let mut _0: &str;
    let mut _3: bool;
    let mut _4: usize;
    let mut _5: &str;
    let mut _6: usize;
    let mut _7: &str;

    bb0: {
        StorageLive(_3);
        StorageLive(_4);
        StorageLive(_5);
        _5 = &(*_1);
        _4 = core::str::<impl str>::len(move _5) -> [return: bb1, unwind: bb6];
    }

    bb1: {
        StorageDead(_5);
        StorageLive(_6);
        StorageLive(_7);
        _7 = &(*_2);
        _6 = core::str::<impl str>::len(move _7) -> [return: bb2, unwind: bb6];
    }

    bb2: {
        StorageDead(_7);
        _3 = Gt(move _4, move _6);
        switchInt(move _3) -> [0: bb4, otherwise: bb3];
    }

    bb3: {
        StorageDead(_6);
        StorageDead(_4);
        _0 = &(*_1);
        goto -> bb5;
    }

    bb4: {
        StorageDead(_6);
        StorageDead(_4);
        _0 = &(*_2);
        goto -> bb5;
    }

    bb5: {
        StorageDead(_3);
        return;
    }

    bb6 (cleanup): {
        resume;
    }
}

I would expect sth like Ty(&'a str), similar to problem 1, however there is nothing like that. Still no clue why.

Besides those problems, is there any good tool or way to intercept the borrow checker logic during compilation? The MIR analysis doesn't give always the straight answear

jw013 · April 18, 2024, 1:17pm

How are you managing to get MIR for code that doesn't compile?

Any lifetime parameter on a function outlives everything inside the function body. Even if there is no need for it, the language requires it. Since _x is inside the function body it cannot possibly live for as long as 'a.

paramagnetic · April 18, 2024, 2:34pm

Actually, even without the unconditional "every lifetime outlives the function" restriction, your code doesn't make any sense.

Lifetimes are generic parameters. This means that the caller chooses them. If you write

fn bar<'a>() {
    let _x: i32 = 123;
    let _y: &'a i32 = &_x;
}

that's asking the compiler to make a reference to a local variable and make its lifetime last as long as the caller pleases. This is very obviously impossible. The caller has no power to decide how long the local variable lives, because the local variable will cease to exist as soon as the function returns. If the caller specified a lifetime that is longer than the function's scope, the types wouldn't check out.

Your mistake is exactly the same as the following typical example, but with lifetimes instead of type parameters:

fn foo<T: ToString>() -> T {
    String::from("foo") // seemingly "OK": String is ToString
}

// if the above code compiled, then what should this do?
foo::<u32>();

rusty-kj · April 18, 2024, 4:23pm

Regarding MIR, I used rustc -Z mir-opt-level=0 -Z dump-mir-dataflow=yes -Z dump-mir=<function-name> src/main.rs.

It's good to know, however it's seems like hidden requirement, which cannot be obtained directly from MIR analysis

rusty-kj · April 18, 2024, 5:01pm

The generic type example has total logical sense. So when we deal with lifetimes annotation beside how long sth should live we need take into consideration generic nature of the annotation.

So if I good understand calling such function

fn foo<'a, 'b, 'c, ..., 'z>(_a: &'a str, _b: &'b str, ...) {}

implies that 'a outlives 'b and 'b outlives 'a for each pairs which is equivalent of 'a = 'b = 'c = ... 'z and then we can use lifetime annotation elision and simplify to

fn foo(_a: &str, _b: &str, ... ) {}

That would explain my second problem.

jw013 · April 18, 2024, 5:31pm

I tried this on a file that looks like

fn bar<'a>() {
    let _x: i32 = 123;
    let _y: &'a i32 = &_x;
}

fn main() {}

and I am unable to get it to spit out any MIR, only a compile error, so I am still curious how did you managed to get MIR output for bar?

$ rustc +nightly -Z mir-opt-level=0 -Z dump-mir-dataflow=yes -Z dump-mir=bar bar.rs
error[E0597]: `_x` does not live long enough
 --> bar.rs:3:23
  |
1 | fn bar<'a>() {
  |        -- lifetime `'a` defined here
2 |     let _x: i32 = 123;
  |         -- binding `_x` declared here
3 |     let _y: &'a i32 = &_x;
  |             -------   ^^^ borrowed value does not live long enough
  |             |
  |             type annotation requires that `_x` is borrowed for `'a`
4 | }
  | - `_x` dropped here while still borrowed

error: aborting due to 1 previous error

For more information about this error, try `rustc --explain E0597`.

rusty-kj · April 18, 2024, 6:13pm

I followed the instruction from stack overflow question. You should have the MIR files in mir_dump directory. I copied data from main.bar.-------.renumber.0.mir one

quinedot · April 18, 2024, 6:47pm

Lifetimes within a function are inferred and checked by the borrow checker. If the borrow checker can prove that there's some solution to all the constraints demanded by annotations, trait bounds, and soundness, the program compiles. If it cannot prove a solution exists, you get a borrow check error.

In foo there's no need for the borrow of _x or lifetime of the type of _y to exist longer than the assignment itself.

In bar you've added an annotation that requires the type of _y to be 'a, which requires the borrow of _x to be at least as long as 'a. The problem is that generic lifetimes on a function are chosen by a caller, and must be at least just longer than the function body. But _x goes out of scope at the end of the function body. That's incompatible with being borrowed, and you get a borrow check error.

Nit: not necessarily.

fn longest<'a: 'c, 'c>(x: &'a str, y: &'a str) -> &'c str { /* ... */ }

If you're asking about the MIR specifically, I don't know (unless it's simply smart enough to take into account what follows).

From a coding perspective, when everything is covariant, adding as many lifetimes as possible often doesn't actually matter. Consider that either function can be used to implement the other:

// The apparently more flexible one can utilize the single lifetime version
// by coercing everything to the shortest lifetime
fn a1<'a>(x: &'a str, y: &'a str) -> &'a str { "" }
fn ab1<'a: 'c, 'b: 'c, 'c>(x: &'a str, y: &'b str) -> &'c str { a1(x, y) }

// The multiple lifetime version is compatible with unified lifetimes
fn a2<'a>(x: &'a str, y: &'a str) -> &'a str { ab2(x, y) }
fn ab2<'a: 'c, 'b: 'c, 'c>(x: &'a str, y: &'b str) -> &'c str { "" }

Note that lifetimes are erased during compilation, so if things compile, you generally shouldn't have to care. That said, things don't always compile, and even when they do I recognize it can sometimes be useful to know why.

But I don't know of a good way to understand the particulars about borrow checking, in particular borrow checking within a function body, without a lot of reading and practice.

Disclaimer: I've only rarely tried to figure out what's going on by looking at MIR.

Topic		Replies	Views
[Solved, Newbie question] About lifetimes	23	863	September 7, 2020
About lifetime annotation help	21	2257	January 12, 2023
Trouble with Lifetimes not being Inferrable help	4	307	October 11, 2021
Annotating lifetime of references in a struct help	14	1062	October 13, 2023
Complicated lifetime annotations help	6	284	October 11, 2023

Lifetimes annotation

Related Topics