Lifetime hell... what to do?

I had been writing a toy scripting language when I learnt rust. It was quite a while when I first wrote it, and I made use of a lot of clones, so, when I came to revisit the code, I decided it would be a good idea to refactor the code to get rid of them. Doing so required a lot of refactoring and a lot of wondering why on earth I had wrote it this way before, and I must thank the compiler for helping me.

One of the things I decided to do was switch to use an arena allocator, and this required converting everything to references, which meant many, many lifetime annotations. At first it was okay, but as the more parts of the code were refactored, they quickly got out of hand.

pub struct Augment<'a, 'env, 'iter> {
    iter: slice::Iter<'iter, Content<'a>>,
    env: &'env mut Environment<'a>,
}

impl<'a, 'env, 'iter> Augment<'a, 'env, 'iter> {
    pub fn new(iter: slice::Iter<'iter, Content<'a>>, env: &'env mut Environment<'a>) -> Self {}
}

This was the lifetimes used for one of the types. If you're wondering how I managed to work through all these lifetimes... I didn't.

Despite not wanting to use AI to write code for me, I had to consult it so I can figure out the proper lifetime annotations. Though the thing was: I knew that the code was safe, but I still had to fight the borrow checker because of its strictness. Even now my brain, upon seeing the amount of lifetimes, doesn't want to work out why exactly each annotation was used.

On the other hand, I knew at the same time that there were times when the compiler caught bugs when there would've been a dangling pointer and I wasn't paying attention.

Should I have approached it a bit differently (perhaps there was a better way to write it such that I didn't need this many lifetimes) or even have used some other language with more relaxed rules?

1 Like

clone() is not intrinsically undesirable, and there is a third option besides “clone many things expensively” or “use references”: use Rc/Arc. For the kind of work “scripting languages” need to do, Rc/Arc is often a good solution.

2 Likes

Well, given that correct annotations worked, that's a matter of perspective. If the code had compiled with the wrong annotations, it presumably would have been unsound -- would have allowed undefined behavior elsewhere, even if you happened not to have UB instantly. Is such code really safe?

At least with the example, you probably would have been fine with two lifetimes instead of one. Though perhaps that's not much of a consolation.

Hard to answer. If you want to use arenas or write zero-copy parsers etc., it's probably unavoidable to have to deal with lifetime issues, whether you do that with AI or by learning how they work. If that's not acceptable, there are other approaches within Rust, such as reference counting or liberal cloning (which is what some other languages do under the hood, but Rust makes it explicit). If you even need them... it's unclear to me why you switched to arenas.

Why did you choose Rust in the first place? And/or what are your priorities?

1 Like

I'm not sure how it would interact with arenas. I don't think there's a point using Rc for arena allocated values.

I have tried a lot to make it work with two lifetimes... perhaps there was a way but I couldn't find it

I chose rust because it was kind of my 'everything' language. Anything besides a simple script I used rust. Probably not a great idea but programming is mainly a hobby for me.

And I was switching to arenas since generating the ast had Boxes inside Boxes, and I didn't want to be allocating in a loop. Cache locality was on the back of my mind when I decided to switch also. Of course, I didn't have to do this, since toy languages aren't exactly known for their speed, but I still wanted to try and optimise the code because I already had a prototype I wanted to extend.

In your code, slice::Iter<'iter, Content<'a>> makes no sense. Content<'a> has a reference inside, which has 'a lifetime. So Content must keep it in the type definition. Next, if you have an iterator over [Content<'a>], the iterator must have the same lifetime, 'a, because of the definition of slice::Iter:

pub struct Iter<'a, T>where
    T: 'a,

So, your slice::Iter doesn't need an extra lifetime 'iter. It needs just 'a.

Same goes for Environment<'a>. If it's 'a, then &mut to it should also have 'a lifetime. Only in a very exotic case you may want a different one, and you'll have to write a restriction on it: 'b: 'a. But I can't even remember a case when I really needed it.

So, you'll have &'a mut Environment<'a>. Now, if you want to put both in a struct, check where Environment references to. If it has references to the same data as Content, then you can get away with just one lifetime, 'a. [edit] I think this is unlikely, because technically you can put Content and Environment pointing to different vars, and they'll have different lifetimes.

But if they point to different structures, then you'll need two:

pub struct Augment<'a, 'b> {
    iter: slice::Iter<'a, Content<'a>>,
    env: &'b mut Environment<'b>,
}

But if you're a novice, I think these structs are too complicated for your level, and the easiest way would be not to unite them in Augment at all. If Augment has a couple of methods, then you'll be fine without it, just write two functions.

Also, structs with lifetimes are mostly used for temporary objects like iterators. Even file readers and format readers don't need lifetimes. So I'd examine the Content and Environment if they need these.

I programmed a decade in Python before, and first months in Rust did the same kind errors and lifetimes infecting everything. The solution is to minimize the usage of structs with references.

Well, that example doesn't look bad. I'm writing embedded software, without allocations, and there may be even more complex lifetimes. In general you split them to separate invariant (inside &mut T) and covariant (familiar and permissive) lifetimes, so you may have removed one.

From the examples you shared, I just don't see what is wrong. It is your personal toy project to learn Rust, thus play and learn. Yeah, you should understand the lifetimes instead of asking chatgpt - construct minimal examples, try to guess how it would behave in different situations and check, this is learning. If you went bowling, it would've been as hard to acquire the skill to bowl :woman_shrugging:t3:. So, if your goal is to learn and have fun... Learn, and try to have fun :blush:

2 Likes

Those file readers etc you are referring to just use allocations inside them, while "temporary objects" tend to be designed without them (or have a file descriptor), as they may be used in a hot loop. It is just rare to create and fully drop the file reader in a hot loop.

[added] The way you should read lifetimes is this:

struct MyStruct<'a> {
   inner: &'a OtherStruct
}

this reads from outside to inside: MyStruct's lifetime is limited by OtherStruct's lifetime. In other words, MyStruct can't live longer than OtherStruct.

fn function_that_takes_ownership(other: OtherStruct) {
    // do something
}
let os = OtherStruct { ...something };
let my_struct = MyStruct { inner: &os };
function_that_takes_ownership(os); // os is now gone, expired
println!("{:?}", my_struct); // ERROR

^^ this won't compile, because os is destroyed, and my_struct can't outlive it.

With this in mind, iterators should have expiration date as the data they iterate over, i.e. same lifetime.

In case of 2 references in a struct, you'll have two lifetimes:

struct MyStruct<'a, 'b> {
    ref_a: &'a usize,
    ref_b: &'b OtherStruct
}

This just means that MyStruct can't outlive neither, 'a (the var ref_a points to) or 'b (ref_b).

1 Like

You accidentally sent your post again instead of editing

3 Likes

oops, will fix

My spider sense tells me it looks like it is an anti pattern to me?

6 Likes

Unfortunately this does not work. I apologise as I omitted some information so I may have caused some confusion. Environment is actually just a type alias to a hash map, and both iter and env contain references to the arena allocator, and so having them as seperate lifetimes doesn't make sense.

I revisited this code now and the lifetimes correspond to:

  • 'a is the lifetime of the arena
  • 'env is the lifetime of the borrow of the environment hash map
  • 'iter is the lifetime of the ast which I was iterating over... I think this could've been better renamed to 'ast

That's very interesting. I have never written embedded so this is a surprise to me. Are there any projects/examples with these kinds of lifetimes that I can read?

1 Like

Yes, and in the end of the post I wrote that a struct with such field is too complicated and not recommended.

Unfortunately, the stuff I have publicly has quite simple lifetimes. In our production codebases we assume that futures are !Forget, which is not technically sound, and follow "structured concurrency" pattern with async, which means a lot of borrows from different parts of the code. As a rule of thumb, you have a borrow for any single mutable thing (&Mutex<Foo<'a>>, &mut Bar<'b> etc) and a single lifetime for read-only immutable and "referencing" parts like &'a Foo<'a>. You of course can give them different names, but why?

By the way, you mentioned a hash map. Does it allocate? :eyes:

I believe they meant another thing. Having &'a mut Foo<'a>, the same lifetime used twice where one of the positions is invariant, is an antimattern. Not because it is inherently wrong, but because people usually don't actually intent to do what it means and assume something else, in other words it is easy to misuse.

1 Like

Yea, it's heap allocated. everything in the hash map are references to things in the arena though

There is a cool crate: heapless - Rust

You can also use linked list (it is arena-friendly) of those smaller hash maps if the number of element is too large.

Other great crates:

The first part is correct, but the second part is incorrect. You can use the same lifetime in multiple parts of a type when no mutability is involved. When you have mutability (whether &mut or interior mutability), you must use distinct lifetimes for the borrow of the mutable thing and for the borrows inside the mutable thing. If you use the same lifetime, the constraints on that lifetime end up making it “borrowed forever”, which is useless in most cases.

The minimum set of lifetimes that do not have this problem, in the given struct definition, are two lifetimes, which could be distributed in several ways depending on where the borrows come from and what is needed:

// All borrows maximally short-lived; distinguish only `Environment`.
pub struct Augment<'a, 'b: 'a> {
    iter: slice::Iter<'a, Content<'a>>,
    env: &'a mut Environment<'b>,
}
// All borrows maximally long-lived; distinguish only `&mut`.
// You might use this if the slice lives in the arena.
pub struct Augment<'a, 'b: 'a> {
    iter: slice::Iter<'b, Content<'b>>,
    env: &'a mut Environment<'b>,
}
// The slice borrow is shorter-lived than the contents.
// Use this if the slice is temporary and the Content borrows from the arena.
pub struct Augment<'a, 'b: 'a> {
    iter: slice::Iter<'a, Content<'b>>,
    env: &'a mut Environment<'b>,
}
5 Likes

the first part is only correct if Content is covariant over 'a, which is not something we know for certain. it probably is, but it might not be.
and even if it is covariant, this strongly and unnecessarily restricts what can be extracted from the Content. before we could have extracted values that live for as long as 'a, the whole lifetime of the arena, but if we merged the lifetime we could only get values with lifetime 'iter, which could then not be inserted in the Environment<'a>

i think this simplification has most potential, as it is unclear wether something is actually gained by separating 'env and 'iter, but it may still potentially not be enough.

if you can get it down to two lifetimes it would be nice, but i think 3 lifetimes is very reasonable.

it's a bit hard to say what is necessary or not without seeing how it is actually used, except for the fact that &'a mut Foo<'a> is wrong >99.9% of the time and you should avoid it like the plague

2 Likes

One trick to "arena-oriented" programming is putting as many things as possible into the arena, which lets you use only a single lifetime everywhere. &'a mut Foo<'a> isn't really an anti-pattern when 'a is an allocator.

In the above I used placeholders for Environment and Content, but this will work with a hash map as well as long as the hash map itself is allocated in the arena alongside the data it references.

3 Likes