Trying to grok subtyping/variance video

I'm watching this Crust of Rust video: Crust of Rust: Subtyping and Variance - YouTube

At around 23:30 someone asks why uncommenting assert_eq confuses the compiler. That's also my question and I don't understand his answer.

  1. He says that without it the compiler doesn't infer a static lifetime for x and instead infers a scope lifetime. This is confusing because I would expect assert_eq to be generic on the lifetime of both arguments, so I don't see why it would affect inference at all. Why would it require anything be static?

  2. Even if I accept inferring the lifetime to be static, why does passing a reference to a static to a function seemingly permanently disable use of the original? I'm guessing maybe because the that static reference could get stashed in a global, so we have no way of knowing it has gone away by the time strtok returns? But then you'd never be able to reuse any static object once a reference to it has been passed into one function, which doesn't seem like it can be right either.

  1. The trick here is in the signature of strtok:

    pub fn strtok<'a>(s: &'a mut &'a str, delimiter: char);
    

    When the program takes a &x or &mut x reference, the lifetime of the reference can at most be the lifetime of the x variable. Since we create it in its own statement strtok(&mut x, ' ');, the &mut x reference can have a shorter lifetime which ends after the function call, which I'll call 'strtok.

    So without the assert_eq!, the compiler knows &mut x must be a &'strtok mut &'_ str. Since strtok specifies that the two lifetimes must be the same, the compiler resolves this by retroactively making x a &'strtok str; this way, &mut x can be a &'strtok mut &'strtok str, and the program compiles.

    But with the assert_eq!, the compiler must make a new &x borrow, with a lifetime that I'll call 'assert. For this to work, x cannot be a &'strtok str, since the lifetime would have already ended. Therefore, the compiler makes x an &'assert str, so that &x becomes an &'assert &'assert str. Similarly, it turns the &mut x in strtok into an &'assert mut &'assert str, to match the strtok signature.

    But notice: now both &x and &mut x are active at the same time, since both have lifetime 'assert. This breaks the aliasing rules, so the compiler prints out the relevant cannot borrow error. Here, variance in assert_eq! is irrelevant; the issue is in the signature of strtok, which forcibly extends the lifetime of the &mut borrow all the way out to the lifetime of the &str.

  2. I believe this is a misunderstanding. When check_is_static is called on x, x is forced to become a &'static str. But to call strtok on it, the &mut x borrow would need to be a &'static mut &'static str, which isn't allowed since x doesn't outlive the scope. Without the weird signature of strtok, we can copy around references to x until the scope ends, and copies of x for the rest of the program.

2 Likes

Here is what I would expect to happen:

  • "hello world" has type &'static mut str. This is just hardcoded into the language as the type of string literals.

  • x therefore has type &'static mut str

  • x itself is a local with goes away at end of scope.

  • strtok takes a &'a mut &'a str, which has the same lifetime appear twice. However I would not expect this to mean you have to pass in a reference to reference where both lifetimes are exactly the same, just that the function body has to conservatively assume that's true. This is like if you define min as taking and returning references all with the same lifetime, the borrow checker has to conservatively assume the returned reference will only last as long as the shortest lifetime passed in, which works for all 3 lifetimes. Since things that are referred to have to outlive their references, given &'a &'b T we can assume 'b is greater than or equal to 'a. So if you see &'a &'a T we assume the object might not outlive the reference (but still could), but that's just the borrow checker shortening 'b to make it match 'a in order to be conservative, not a statement about how long the T will really be alive.

  • The borrow checker should infer shorter lifetimes where possible to avoid unnecessary conflicts.

  • &mut x is a &'it_works &'static str. This is it's "true" lifetime, separate from what the borrow checker may "cast" it to for passing to strtok.

  • In passing to strtok it should "cast" it to be the more conservative &'it_works &'it_works str. This still satisfies the lifetime bounds for strtok because the lifetimes match, and is safe because we only shortened a static lifetime (we gave the function a less strong guarantee).

  • After strtok finishes x is no longer considered borrowed, so assert_eq should be free to borrow.

I'm confused because it seems like somehow inference is causing some undesirable flow of information backwards, but that also seems unnecessary here, since x and &mut x should have unambiguous types and lifetimes.

This is the inaccurate part: &'_ T is covariant in T, but &'_ mut T is invariant in T. This means that we can always convert an &'a &'b T into an &'a &'a T, but in general we cannot convert an &'a mut &'b T into an &'a mut &'a T. For the importance of this, consider this example program:

fn cast_str_ref<'a, 'b>(s: &'a mut &'b str) -> &'a mut &'a str {
    // assume that this function works
    unimplemented!()
}

fn main() {
    let mut static_str: &'static str = "Hello, world!";
    let static_str_ref: &mut &'static str = &mut static_str;
    {
        let local_string: String = "Hello, Rust!".to_string();
        let local_str: &str = &local_string;
        let local_str_ref: &mut &str = cast_str_ref(static_str_ref);
        *local_str_ref = local_str;
    }
    // now we're reading from a deallocated `String`!
    println!("{:?}", *static_str_ref);
}

When we declare static_str_ref, we tell the compiler that it holds a 'static string slice. However, by casting it from a &'main mut &'static str into a &'main mut &'block str, we are able to write a 'block string slice into it. After the block's scope ends, the String it refers to is deallocated. But static_str_ref and static_str hold onto the slice, since they believe it to be 'static. This results in a use-after-free once we try to read from either variable.

This is why strtok's signature creates such idiosyncratic behavior: since the s parameter is an &'a mut &'a str, the &mut x borrow must always last exactly as long as x's own lifetime. In particular, since the borrow cannot last for longer than 'it_works, x itself cannot be any longer-lived than an &'it_works str. This is the "backwards" information flow you refer to.

2 Likes

It's &'static str, not &'static mut str.

@LegionMammal978 showed why this is unsound; another example I like to give is shortening the lifetime of references in a Vec. (Try running with Miri under Tools.) ((Edit: Corrected link))

Also, it would be &'it_works mut &'it_works str -- as is probably becoming clear, the distinction between &mut and & is very crucial. Unlike variable bindings, exclusive (&mut) and shared (&, "immutable") references are different types, and they have very different semantics.

The API contract works both ways: The function body can assume it, but callers have to satisfy it. So you do actually have to pass in a references with lifetimes exactly the same. In places where reborrowing or variance apply (e.g. all shared references), this is often transparent.

Because all lifetimes have to be the same in order to call the function, the borrow checker doesn't really need to care about which was shortest before reborrowing for the call (e.g. when considering the return in the function body). The call site has to be able to make them all the same.

Well, I don't know if I would phrase it like that. It's shortening 'b in order to match the signature, not to be conservative. There's still a significant difference between the two in this case, as you can copy the inner (non-mut) reference out from behind the outer reference.

If there's no particular reason to force lifetimes to be the same, it's generally better not to -- especially when &mut is involved.


Perhaps it will be useful to walk through a few variations of this setup.

    let mut x = "hello world";
    // check_is_static(x);
    strtok(&mut x, ' ');
    // assert_eq!(x, "world");

This compiles. Some arbitrarily lifetime is chosen for the type of x -- call it 'x -- and the call to strtok uses &'x mut &'x str. 'x practically consists of the two statements. We'll revisit this shortly.

    check_is_static(x);
    strtok(&mut x, ' ');

This fails -- x is forced to type &'static str, so the call to strtok would require a &'static mut to a local variable. This is true whether or not the assert_eq! is uncommented.

    // check_is_static(x);
    strtok(&mut x, ' ');
    assert_eq!(x, "world");

This also fails, even though the lifetime of the type of x need no longer be 'static. 'x must at a minimum consist of the three statements (declaration, strtok, assert_eq!) [1]. But the call to strtok uses &'x mut 'x str still, which makes the outer mutable borrow active at the same time as the underlying &str in the assert_eq! statement.

More generally, &'a mut Thing<'a> is an anti-pattern because the Thing is exclusively borrowed for the rest of it's lifetime -- you cannot use Thing<'a> ever again, except through the &'a mut borrow somehow. (E.g. if strtok had a return value, you could use that.)

The unmodified playground worked because it didn't try to use x after the &'x mut x was created and passed to strtok. x became completely unusable at that point, but since we didn't attempt to, it was accepted.


Does it make sense to "manually reborrow" then? This compiles:

    // check_is_static(x);
    strtok(&mut &*x, ' ');
    assert_eq!(x, "world");

No, it doesn't really make sense, because now you're modifying a temporary -- the assert_eq will fail.

The proper fix is removing the anti-pattern from strtok.


  1. @LegionMammal978 called this lifetime 'assert ↩ī¸Ž

2 Likes