Lost with lifetimes

The following is reduced from the use of types from the tar crate:

use std::marker::PhantomData;
use std::io::{self, Read};
use std::cell::RefCell;

struct Entries<'a>(PhantomData<&'a ()>);

struct Entry<'a>(PhantomData<&'a RefCell<dyn Read + 'a>>);

impl<'a> Iterator for Entries<'a> {
    type Item = io::Result<Entry<'a>>;

    fn next(&mut self) -> Option<Self::Item> {
        todo!()
    }
}

struct Foo<'a>(Entries<'a>);

impl<'a> Foo<'a> {
    fn foo(&mut self) -> Option<Entry> {
        self.0.next().transpose().unwrap()
    }
}

It fails to compile with:

error: lifetime may not live long enough
  --> src/lib.rs:21:9
   |
19 | impl<'a> Foo<'a> {
   |      -- lifetime `'a` defined here
20 |     fn foo(&mut self) -> Option<Entry> {
   |            - let's call the lifetime of this reference `'1`
21 |         self.0.next().transpose().unwrap()
   |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ associated function was supposed to return data with lifetime `'a` but it is returning data with lifetime `'1`
   |
   = note: requirement occurs because of the type `Entry<'_>`, which makes the generic argument `'_` invariant
   = note: the struct `Entry<'a>` is invariant over the parameter `'a`
   = help: see <https://doc.rust-lang.org/nomicon/subtyping.html> for more information about variance

I don't quite understand what this all means. I swear I had found a page that made things about variance click, but it didn't stay in my brain and I can't find it anymore... it definitely was not the nomicon.

The one thing that makes a difference is the RefCell. Removing it makes it compile. And I don't get what kind of difference RefCell does as far as the lifetimes are concerned (and I can't do that change because it's really tar that uses RefCell.

I feel the root cause of the problem is the Iterator trait, but I can't do anything about that. It seems to me a transmute to force the lifetime would be sound as long as I only touch the Entry while it's alive, but I'm not entirely sure.

You're returning Option<Entry> (with the elided/implied lifetime of self) rather than Option<Entry<'a>> with lifetime of its source.

2 Likes

Well, the lifetime of the source is "forced" by the use of the Iterator trait, but is not desirable... I do want the elided lifetime.

Just to be clear to avoid misunderstandings. The elided lifetime in

impl<'a> Foo<'a> {
    fn foo(&mut self) -> Option<Entry> {
        self.0.next().transpose().unwrap()
    }
}

is sugar for something like

impl<'a> Foo<'a> {
    fn foo<'b>(self: &'b mut Foo<'a>) -> Option<Entry<'b>> {
        self.0.next().transpose().unwrap()
    }
}

The compiler returns an error, because the value that the implementation generates, i.e. the expression self.0.next().transpose().unwrap(), has type Option<Entry<'a>>, not Option<Entry<'b>>.

“I do want the elided lifetime” would mean you do want a different type than the type that your implementation gives you. This either means your implementation is not what you want, or you’re mistaken about what type you actually want, (or you might be running into contradictory requirements that are only symptoms of bigger underlying design problems elsewhere in your program).

To help you figure out which one it is, I can give a quick overview of how the type Option<Entry<'a>> comes to be, and what variance and the error message have to do with everything:


The type Option<Entry<'a>> is actually really straightforward: You have &'b mut Foo<'a>, and the Foo<'a> contains Entries<'a>, whose next method will return Option<Result<Entry<'a>>>. Transposing and unwrapping does not further change the lifetime.


Variance becomes important in what the effects of having the wrong type here are. Obviously, you have a Option<Entry<'a>> and want to return a Option<Entry<'b>>, but the Rust compiler doesn’t immediately say nah, those types are different! – no, it instead tries to figure out whether perhaps they are the same types or they can be coerced. Variance is the system of figuring out whether subtyping coercions apply, typically to change the lifetime of a type, exactly what we want here. RefCell is an invariant type constructor, and this is why it becomes important here: Entry<'_> is invariant in its lifetime argument because of it, so you can not convert between Entry<'a> and Entry<'b> unless they’re exactly the same lifetime.

On the other hand, with RefCell out of the equation, the coercion in question Entry<'a> to Entry<'b> (inside an Option) would be possible, if 'a outlives 'b. Which it does, as an implied bound that comes from the self type &'b mut Foo<'a>. I.e. it’s (almost) as if there’s an implicit 'a: 'b bound on the function foo. That also explains the error message now: it’s “lifetime may not live long enough”, nothing about a type mismatch, and if you desugar it as described above, it becomes more clear even in stating consider adding the following bound: `'b: 'a` , i.e. the compiler feels as if a 'b: 'a requirement is missing here in order to make the code compile. Which is true, technically, since together with the implicit 'a: 'b, such a bound would ensure that 'a and 'b are restricted to be the same lifetime (which is a possibility), and in this case, the two types Entry<'a> and Entry<'b> would actually be the same.

IMO there’s room for improvement with this error message. Instead of figuring out the “fix” for this function would be that different lifetime arguments ought to be restricted to be exactly the same, I would probably appreciate an explicit mention about the types Option<Entry<'a>> and Option<Entry<'b>> (or Option<Entry<'1>> with some definition of what elided lifetime '1 is supposed to refer to in case of the original code). It’s conceptually easier to speak about “type mismatch” for invariant lifetimes IMO, whilst the borrow checker likes to identify and complain about “lifetime does not live long enough”-style problems only. But at least it points to Entry being invariant being relevant, which is probably – together with the associated function was supposed to return … line – already enough for decently experienced Rust users to figure out what’s going on anyways.

3 Likes

The Iterator trait by design requires that this code is valid:

let a = iter.next();
drop(iter);
read(a);

which means you will never be able to return any data stored in the iterator itself.

1 Like

I think I'm starting to get a feel about what's going on, which is essentially that the lifetime for what's in the RefCell and for the ref to the RefCell are tied, and &'a RefCell<dyn Read + 'a> can't be shortened to &'b RefCell<dyn Read + 'b>. I'm not entirely sure why the latter is happening, though.

Untying the lifetimes does make it compile:

use std::marker::PhantomData;
use std::io::{self, Read};
use std::cell::RefCell;

struct Entries<'a, 'b: 'a>(PhantomData<(&'a (), &'b ())>);

struct Entry<'a, 'b: 'a>(PhantomData<&'a RefCell<dyn Read + 'b>>);

impl<'a, 'b: 'a> Iterator for Entries<'a, 'b> {
    type Item = io::Result<Entry<'a, 'b>>;

    fn next(&mut self) -> Option<Self::Item> {
        todo!()
    }
}

struct Foo<'a, 'b: 'a>(Entries<'a, 'b>);

impl<'a , 'b: 'a> Foo<'a, 'b> {
    fn foo(&mut self) -> Option<Entry<'_, 'b>> {
        self.0.next().transpose().unwrap()
    }
}

... but I don't have control over that.

What could go wrong if I transmute in Foo::foo?

Instant UB if &mut is involved and probable UB / memory corruption otherwise. The compiler is stopping you from doing something unsound here. Transmuting to get rid of lifetime errors is almost never a correct solution.

Why doesn't this work for you?

impl<'a> Foo<'a> {
    fn foo(&mut self) -> Option<Entry<'a>> {
        self.0.next().transpose().unwrap()
    }
}
1 Like

I wanted to avoid the lifetime being tied to the lifetime in the type because the actual type I return is an enum where another variant is already tied to the lifetime of self, so I wanted to avoid the enum having 2 lifetimes associated with it. I just bit the bullet... it ended up being not as bad as I thought it would be.

FWIW, though, the only things that could have been done with the Entry was to give it buffer slices that it doesn't keep around, so practically speaking shortening the lifetime would probably have been fine...

In general transmuting a &'a RefCell<SomethingThatContains<'a>> to a &'b RefCell<SomethingThatContains<'b>> (where 'a: 'b) is UB because you can .borrow_mut() the RefCell and then write a SomethingThatContains<'b> where a SomethingThatContains<'a> is expected. @quinedot's playground link shows why

The weird part arises with dyn Trait + 'a. Technically speaking it is in the form SomethingThatContains<'a>, however, it doesn't seem to have the properties that make the above unsound. In particular you can't write a dyn Trait + 'b through it because it's unsized. So is it sound to transmute &'a RefCell<dyn Trait + 'a> to &'b RefCell<dyn Trait + 'b> where 'a: 'b? AFAIK it hasn't been decided yet, which means it may not be UB now, but it also may become UB in the future. So don't rely on it for now.

The very point of a RefCell is that it provides mutability. It's the exact same reason why you can't convert a &mut T<'long> into a &mut T<'short>. A mutable borrow is a two-way channel: you are allowed to read and write through it (writing is kind of the point). Now if you are allowed to read and write through it, then you could do the following:

let long_living_ref: &mut T<'long> = get_long_living_ref();
'short: {
    let destroyed_outside_scope = ...;
    let short_living_value: T<'short> = get_short_living_value(&destroyed_outside_scope);
    let short_living_ref: &mut T<'short> = &mut *long_living_ref;
    *short_living_ref = short_living_value;
}
// `short_living_value` isn't valid here anymore
println!("*long_living_ref = {}", *long_living_ref); // use-after-free

which is obviously UB. Now if the same were allowed with RefCell, then you could do the exact same dance, just by inserting explicit borrow()s and borrow_mut()s at the appropriate places:

let long_living_ref: &RefCell<T<'long>> = get_long_living_ref();
'short: {
    let destroyed_outside_scope = ...;
    let short_living_value: T<'short> = get_short_living_value(&destroyed_outside_scope);
    let short_living_ref: RefMut<T<'short>> =  (&long_living_ref as &RefCell<T<'short>>).borrow_mut();
    *short_living_ref = short_living_value;
}
// `short_living_value` isn't valid here anymore
println!("*long_living_ref = {}", *long_living_ref.borrow()); // use-after-free
2 Likes

Strictly speaking, any behavior that hasn't been defined by some relevant specification is UB, even if there is not currently any compiler optimization that takes advantage of it. The ongoing discussion is about whether the current behavior should be codified, making it non-UB and restricting the sorts of language changes that could be implemented in the future.

1 Like

How would lifetime transmutes be considered UB? I'd say that they aren't immediate UB, since transmute() & co. only care about layout validity, and lifetimes cannot affect layout. Instead, lifetime transmutes just generate objects that are unsound to release to arbitrary safe code. The UB only occurs if a program performs an operation on a transmuted object that it would not have been able to perform on the original object (e.g., using a 'static-transmuted reference past its original lifetime). For instance, it is sound to transmute an object to 'static, take it across some boundary, and transmute it back, as long as you respect the object's initial lifetime (as well as the rules of the boundary).

2 Likes

Currently it's considered instant UB whenever a &mut is aliased (e.g. the distinctions made at the top of this issue), and you can use transmute to create such an aliasing. Aside from just being considered something fundamental in Rust, I'd assume noalias-based optimizations are allowed even in unsafe; that is, it's considered a validity invariant (must be upheld in unsafe) not just a safety invariant.

Or perhaps rephrased, aliasing a &mut is an operation you can't perform.

1 Like

Sure, it's UB to alias a &mut that you've transmuted the lifetime of, but the transmute() itself is not UB: you just have to be very careful not to perform the illegal aliasing (or any other illegal operations). If you rely on safe code outside your interface not to alias the lifetime-transmuted value, then your interface is very unsound. But creating the &mut still doesn't trigger the UB; it's the outside code aliasing it that triggers the UB. For instance, in your earlier program, calling replace() doesn't trigger UB, but the next line that touches rc suddenly runs into the validity invariant on the reference. (Curiously, Miri doesn't signal UB until the reference is actually formatted, probably because the invalid reference is inside an UnsafeCell while rc still owns it. In principle, the &rc autoref could be considered UB.)

If you have two (non-reborrowed) &muts pointing to the same value, even their co-existence is absolutely, positively UB.

1 Like

As I said in an earlier post, "co-existence" isn't really a meaningful concept, at least under our current aliasing model. A reference only "exists" at the moment that the memory containing it is read from, written to, or reborrowed, i.e., when an operation forces the memory to act as a given type. In your example of two &muts, it's interleaving access that is UB: creating the second invalidates the first, and the UB only occurs once we try to access the first again.

In the RefCell example, first, Some(&*short) is stored into the &RefCell. Then, short is dropped at the end of the block, invalidating the reference inside the RefCell. However, the RefCell does not exist at this point: it's just some random (isize, *mut (), *mut ()) sitting on main()'s stack. Only once rc is accessed as a RefCell in main() is the reference forced to be valid, resulting in the UB.