Solving more complex borrowing/ownership issues?

Does anyone know of a good guide to solving more complicated ownership/borrowing issues? Most of the resources I've been able to find cover the basics, and haven't been helpful to me in figuring out how to address issues in more complicated code.

In my case, I'm trying to implement the lox interpreter from Robert Nystrom's Crafting Interpreters book (https://www.craftinginterpreters.com) as a learning exercise. I'm trying to keep all references in the parser/compiler to original strings in the source code as &str references; the lifetimes seem to be getting complicated as the data structures get deeper, particularly with methods on a struct which want to use values from references in vectors owned by the struct.

The fundamental requirement that &str requires is that the owner of the string data must remain immutable for the full duration of the &str. Thinking about whether your code violates this can get you very far.

Alice, I have that set up; I read in a String at the start of the program and keep it around for the duration, passing &strs as references down into the other objects.

That's not the problem I'm having. The problem I'm having is that I have a number of structs which interact with each other; I'm pretty much fully surprised when the compiler tells me that I've violated a borrowing rule, and don't have a good sense of how to tease the relationships apart in a way which will allow me to resolve the issue. That's what I'm looking for help with. Maybe a more fully worked example?

Unfortunately I don't think I can give advice without seeing the specific problems you've run in to.

3 Likes

One very general suggestion: when you encounter borrowing errors that you don't understand, annotate all the lifetimes explicitly, and give every distinct lifetime parameter a unique name. Say you have something like the following:

struct Keyword<'a> {
    buf: &'a str,
}

enum Value<'a> {
    Int(i64),
    Str(Cow<'a, str>),  // string literals may reference the input data without copying I guess
}

impl<'a> Interpreter<'a> {
    fn whatever(&mut self, arg: &Keyword) -> Value {
        // ...
    }
}

If you're getting errors inside or around the call site of whatever, un-elide all those implicit lifetimes and give meaningful names to all the ones that are the same:

struct Keyword<'interp> {
    buf: &'interp str,
}

enum Value<'interp> {
    Int(i64),
    Str(Cow<'interp, str>),  // string literals may reference the input data without copying I guess
}

impl<'interp> Interpreter<'interp> {
    fn whatever<'a, 'b, 'c, 'd>(&'a mut self, arg: &'b Keyword<'c>) -> Value<'d> {
        // ...
    }
}

Use warn(rust_2018_idioms) to be warned about omitting the parameters from types like Keyword and Value.

(Don't forget that dyn Trait and impl Trait have implicit lifetimes which are sometimes implicitly 'static and sometimes something else, depending on context.)

Doing this sometimes gives you more useful error messages, and the exercise of assigning identifiers may give you a hint as to what's wrong (in this example, it seems plausible that 'c and 'd should actually be 'interp, although of course that depends on the desired semantics).

If you've only got one thing you're borrowing all the strs from, the lifetimes shouldn't get more complex as you nest data structures: they all just borrow from the same place with the same lifetime, regardless of how deep the & is. That this isn't working for you suggests to me that you're accidentally doing some re-borrowing somewhere instead of copying references. But that's just a guess.

4 Likes

Thanks, trentj! Going down that path now...

One thing I'm seeing as I look through the code is that I tend to do things like taking a reference to something in a collection owned by the current struct and modifying it, rather than putting the modification in a method on something. I think that's extending the borrowing lifetimes when I don't expect it to...

While rust is certainly capable of writing a zero-copy parser that keeps references to everything, actually writing one is no small investment, and you'll be dropping yourself into the deep end very fast.

The benefits of zero-copy parsing tend to be small, as I see them:

  • String literals typically need a preprocessing pass to handle escape sequences
  • You will often want to intern identifiers for faster hashing and comparison
  • If you want the original strings for printing nice error messages, you can do that better by passing around some Span { start: u32, end: u32 } type instead, describing byte ranges into some sort of "code map". (see e.g. the codespan crate)

I would remark that you should generally try to place as few lifetimes as possible on any given type. Take a look at rustc, for instance. In rustc, a great deal of types have a single lifetime parameter 'tcx—a (typically invariant) lifetime that represents both (a) borrows of data constructed during an extremely early stage of compilation, and (b) arenas for allocation. Something like a visitor which needs to borrow a piece of this context will additionally have a second lifetime representing that borrow. Like this guy:

https://github.com/rust-lang/rust/blob/ad02dc46badee510bd3a2c093edf80fcaade91b1/compiler/rustc_mir/src/transform/instcombine.rs#L32-L35

But you will seldom find more than two lifetimes on anything.

There's even a little trick that you can see a tiny hint of here: This TyCtxt<'tcx> you see here is one of the most central structs of the entire compiler (I would naively argue that you could probably very well say that 'tcx is defined as "the lifetime argument of TyCtxt"). Yet here it is being placed into a struct by value, without needing to be borrowed! How could this be?

The answer is simple: They incorporated a borrow directly into TyCtxt, so that it implements Copy and thus can be freely passed around anywhere:

pub struct TyCtxt<'tcx> {
    gcx: &'tcx GlobalCtxt<'tcx>,
}

Though notably, this means that any mutation of data stored in TyCtxt must occur through some layer of interior mutability like RefCell. And this makes sense; there are many similar cases where lifetimes can be made simpler at the expense of giving up some of their power, by deferring some borrow checks to runtime rather than compile time.

1 Like

Thank you both for your help. I'm still struggling a bit, but the advice about copying strings rather than trying to reference the original str has helped.

The bit about storing ranges as two u32 values feels a bit of a cheat :slight_smile: but I'll look into that as well.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.