Return slices into owned string

I'm writing an on demand parser for a learning project and have boiled down the problem to the following short snippet.

struct Token<'a> {
    original: &'a str,
}

struct Parser {
    source: String,
}

impl Parser {
    pub fn parse_token(&mut self) -> Token {
        return Token {
            original: &self.source[0..3],
        };
    }
}

fn main() {
    let mut parser = Parser {
        source: String::from("var x = 5;"),
    };

    let mut tokens = Vec::new();

    let t1 = parser.parse_token();
    tokens.push(t1);

    let t2 = parser.parse_token();
    tokens.push(t2);
}

I can't quite reason about the lifetimes here. I know that all the lifetimes of Token have to meet or exceed that of parser, since it owns the data that each Token references. The problem, I think, is that I'm over-constraining the lifetime somewhere, because the compiler emits this warning:

error[E0499]: cannot borrow `parser` as mutable more than once at a time
  --> src\main.rs:27:14
   |
24 |     let t1 = parser.parse_token();
   |              -------------------- first mutable borrow occurs here
...
27 |     let t2 = parser.parse_token();
   |              ^^^^^^^^^^^^^^^^^^^^ second mutable borrow occurs here
28 |     tokens.push(t2);
   |     --------------- first borrow later used here

Which makes sense, the real signature of parse_token is:

pub fn parse_token<'a>(&'a mut self) -> Token<'a>

Which (I think) is saying that the lifetime of the reference to self (parser) must be the same as the lifetime of the returned Token. Well, that creates the exact problem the compiler complains about, as it is extending the lifetime of the mutable borrow to that of the Token, making it impossible to call the function twice essentially. It is a little confusing that the push of t2 is counting as a borrow of parser, though I guess that is just the lifetime of the vec being extended, therefore t1's lifetime being extended

Those are my thoughts. Some help with how to correctly annotate the lifetimes would be appreciated. Also, I know I could not have Token actually maintain the original text and do something cleverer, but I feel like this should be possible, I just don't know how to "spell" it per se.

When you take a &mut self, as with parse_token, you're allowed to mutate any part of self. This means that you could replace the contained String, for example, causing any outstanding references to dangle. Therefore, calling parse_token invalidates any previous return value.

More generally, only one &mut to the same memory can be active at the same time.

  • &mut would have been better named &unique or the like
  • When you return something with the same lifetime as the &mut self, the uniqueness must persist even if the returned value is not a &mut

Looking at the code:

    let mut tokens = Vec::new();   // Call this Vec<Token<'v>>

    let t1 = parser.parse_token(); // '_1 --+
    tokens.push(t1);               //       |
    // '_1 must end before the next stmt.   |
    // for the reasons explained.         __:__

    // It must be the case that '_1 outlives 'v ('_1: 'v)
    // since we're storing a Token<'_1> in our Vec<Token<'v>>

    let t2 = parser.parse_token(); // '_2

    // We "use '_1" here as it's contained in `tokens` and we're
    // again borrowing the whole of `tokens` with a `&mut self`.
    //
    // In other words, '_1 must still be alive here, but that is
    // a contradiction.
    tokens.push(t2);

In your minimized example, you can instead take a &self. You can have many shared references at the same time.

However, this may not be applicable to your larger use case.

2 Likes

I see what's happening, thanks. In my larger code parse_token does mutate the state of Parser, so it must take &mut self. Any ideas for that case?

You could use a RefCell.

If your source is borrowed within Parser itself, you can return (copied) references to it while mutating the rest of Self. Note how the returned Token<'p> is no longer tied to the borrowed &mut self.

However, this also burdens Parser<'p> with a lifetime, and you'll have to keep the backing String somewhere.

(There are other approaches... perhaps incoming from fellow forum members now :slightly_smiling_face:)

3 Likes

One alternative is to use shared ownership between the tokens and the text you're parsing, so that keeping any token alive will also keep its backing store around:

(untested)

#[derive(Clone)]
struct Token {
    buf: Arc<str>,
    pos: Range<usize>
}

impl std::ops::Deref for Token {
    type Target = str;
    fn deref(&self)->&str { self.buf[self.pos] }
}

// Implementations of Eq, Ord, Hash, Display, Borrow, etc.
// which defer to str's implementation

struct Parser {
    buf: Arc<str>,
    // ...
}

impl Parser {
    fn new(text: &str)->Self {
         Parser { buf: text.into() }
    }

    fn parse_token(&mut self)->Token { todo!() }
}
2 Likes

Oh that's interesting. The compiler might need some hinting to understand the lifetime are distinct right?

This could just be Rc right?

Sure, if you don't ever need to send a token between threads.

1 Like

Gotcha, just checking.

The signature in the borrowing case is enough to understand the lifetimes are distinct:

    pub fn parse_token(&mut self) -> Token<'p> {
    // approximately sugar for...
    pub fn parse_token<'s>(&'s mut self) -> Token<'p> {

See also lifetime elision:

Each elided lifetime in input position becomes a distinct lifetime parameter.

But you can't get an unrelated lifetime out if you own the String, because again, there's nothing in the API signature (that the compiler treats like a contract) to prevent you from doing something like:

    pub fn parse_token(&mut self) -> Token<'_> {
        self.source = String::new(); // drop existing storage...
        Token { original: "yolo" }
    }
1 Like

I see. I didn't understand that detail about how in with a mutable reference to Parser in my original example, I could do anything to the String, since in my own implementation it is immutable.

To elaborate just a little more, in the owned case you can return a lifetime which is technically distinct, but it doesn't help:

pub fn parse_token<'long: 'short, 'short>(&'long mut self) -> Token<'short> {
        Token {
            original: &self.source[0..3],
        }
}

And the main reasons it doesn't help are:

  • self remains mutably (exclusively) borrowed for 'long, as that's the signature
  • Token<'_> is covariant, which means its lifetime can be shrunk automatically anyway
  • Moreover, the lifetimes are still linked to each other due to the 'long: 'short bound (the compiler understands that the long borrow may "flow into" the short borrow)

So it's functionally the same as the signature that uses elision.


The more general limitation is sometimes called an interprocedural conflict, where you're basically wishing for the ability to say "this method is mutable over some fields but immutable over others". There's no built-in feature to do this in Rust so far; &mut covers the entire struct.

2 Likes

Yes, exactly my thoughts after playing with it some more. I ended up working up the same lifetimes signature as you put there, and as you say, it doesn't help for the reasons that the distinct lifetimes are still related, and the fact that &mut borrows the whole struct.

It would be neat to be able to communicate a partial mutable borrow either by:

  1. Some special receiver adapter, like a type you invent and implement PartialBorrow on.
  2. The type of string being akin to final in Java, so that even though &mut self borrows the whole struct, it would understand that source is immutable.

At any rate, I ended up just storing a std::ops::Range in Token, and use that to extract the origin in the event of an error, since that's the only time I need the original Token, though the alternate solutions presented by you helpful folks are slick as well.

1 Like

In fact, this is already (somewhat) possible with closures, they now capture only the fields they need right?

The closure use case does indeed work in edition 2021 (but doesn't help with methods).

Yeah, still neat though. I had read a couple of that person's blog posts before, good stuff.