Return slices into owned string

DriedUrchin · April 10, 2022, 6:11am

I'm writing an on demand parser for a learning project and have boiled down the problem to the following short snippet.

struct Token<'a> {
    original: &'a str,
}

struct Parser {
    source: String,
}

impl Parser {
    pub fn parse_token(&mut self) -> Token {
        return Token {
            original: &self.source[0..3],
        };
    }
}

fn main() {
    let mut parser = Parser {
        source: String::from("var x = 5;"),
    };

    let mut tokens = Vec::new();

    let t1 = parser.parse_token();
    tokens.push(t1);

    let t2 = parser.parse_token();
    tokens.push(t2);
}

I can't quite reason about the lifetimes here. I know that all the lifetimes of Token have to meet or exceed that of parser, since it owns the data that each Token references. The problem, I think, is that I'm over-constraining the lifetime somewhere, because the compiler emits this warning:

error[E0499]: cannot borrow `parser` as mutable more than once at a time
  --> src\main.rs:27:14
   |
24 |     let t1 = parser.parse_token();
   |              -------------------- first mutable borrow occurs here
...
27 |     let t2 = parser.parse_token();
   |              ^^^^^^^^^^^^^^^^^^^^ second mutable borrow occurs here
28 |     tokens.push(t2);
   |     --------------- first borrow later used here

Which makes sense, the real signature of parse_token is:

pub fn parse_token<'a>(&'a mut self) -> Token<'a>

Which (I think) is saying that the lifetime of the reference to self (parser) must be the same as the lifetime of the returned Token. Well, that creates the exact problem the compiler complains about, as it is extending the lifetime of the mutable borrow to that of the Token, making it impossible to call the function twice essentially. It is a little confusing that the push of t2 is counting as a borrow of parser, though I guess that is just the lifetime of the vec being extended, therefore t1's lifetime being extended

Those are my thoughts. Some help with how to correctly annotate the lifetimes would be appreciated. Also, I know I could not have Token actually maintain the original text and do something cleverer, but I feel like this should be possible, I just don't know how to "spell" it per se.

quinedot · April 10, 2022, 6:41am

When you take a &mut self, as with parse_token, you're allowed to mutate any part of self. This means that you could replace the contained String, for example, causing any outstanding references to dangle. Therefore, calling parse_token invalidates any previous return value.

More generally, only one &mut to the same memory can be active at the same time.

&mut would have been better named &unique or the like
When you return something with the same lifetime as the &mut self, the uniqueness must persist even if the returned value is not a &mut

Looking at the code:

    let mut tokens = Vec::new();   // Call this Vec<Token<'v>>

    let t1 = parser.parse_token(); // '_1 --+
    tokens.push(t1);               //       |
    // '_1 must end before the next stmt.   |
    // for the reasons explained.         __:__

    // It must be the case that '_1 outlives 'v ('_1: 'v)
    // since we're storing a Token<'_1> in our Vec<Token<'v>>

    let t2 = parser.parse_token(); // '_2

    // We "use '_1" here as it's contained in `tokens` and we're
    // again borrowing the whole of `tokens` with a `&mut self`.
    //
    // In other words, '_1 must still be alive here, but that is
    // a contradiction.
    tokens.push(t2);

In your minimized example, you can instead take a &self. You can have many shared references at the same time.

However, this may not be applicable to your larger use case.

DriedUrchin · April 10, 2022, 6:44am

I see what's happening, thanks. In my larger code parse_token does mutate the state of Parser, so it must take &mut self. Any ideas for that case?

RedDocMD · April 10, 2022, 6:51am

You could use a RefCell.

quinedot · April 10, 2022, 6:52am

If your source is borrowed within Parser itself, you can return (copied) references to it while mutating the rest of Self. Note how the returned Token<'p> is no longer tied to the borrowed &mut self.

However, this also burdens Parser<'p> with a lifetime, and you'll have to keep the backing String somewhere.

(There are other approaches... perhaps incoming from fellow forum members now )

2e71828 · April 10, 2022, 7:02am

One alternative is to use shared ownership between the tokens and the text you're parsing, so that keeping any token alive will also keep its backing store around:

(untested)

#[derive(Clone)]
struct Token {
    buf: Arc<str>,
    pos: Range<usize>
}

impl std::ops::Deref for Token {
    type Target = str;
    fn deref(&self)->&str { self.buf[self.pos] }
}

// Implementations of Eq, Ord, Hash, Display, Borrow, etc.
// which defer to str's implementation

struct Parser {
    buf: Arc<str>,
    // ...
}

impl Parser {
    fn new(text: &str)->Self {
         Parser { buf: text.into() }
    }

    fn parse_token(&mut self)->Token { todo!() }
}

DriedUrchin · April 10, 2022, 7:02am

Oh that's interesting. The compiler might need some hinting to understand the lifetime are distinct right?

DriedUrchin · April 10, 2022, 7:04am

This could just be Rc right?

2e71828 · April 10, 2022, 7:04am

Sure, if you don't ever need to send a token between threads.

DriedUrchin · April 10, 2022, 7:05am

Gotcha, just checking.

quinedot · April 10, 2022, 7:09am

The signature in the borrowing case is enough to understand the lifetimes are distinct:

    pub fn parse_token(&mut self) -> Token<'p> {
    // approximately sugar for...
    pub fn parse_token<'s>(&'s mut self) -> Token<'p> {

See also lifetime elision:

Each elided lifetime in input position becomes a distinct lifetime parameter.

But you can't get an unrelated lifetime out if you own the String, because again, there's nothing in the API signature (that the compiler treats like a contract) to prevent you from doing something like:

    pub fn parse_token(&mut self) -> Token<'_> {
        self.source = String::new(); // drop existing storage...
        Token { original: "yolo" }
    }

DriedUrchin · April 10, 2022, 7:12am

I see. I didn't understand that detail about how in with a mutable reference to Parser in my original example, I could do anything to the String, since in my own implementation it is immutable.

quinedot · April 10, 2022, 7:20am

To elaborate just a little more, in the owned case you can return a lifetime which is technically distinct, but it doesn't help:

pub fn parse_token<'long: 'short, 'short>(&'long mut self) -> Token<'short> {
        Token {
            original: &self.source[0..3],
        }
}

And the main reasons it doesn't help are:

self remains mutably (exclusively) borrowed for 'long, as that's the signature
Token<'_> is covariant, which means its lifetime can be shrunk automatically anyway
Moreover, the lifetimes are still linked to each other due to the 'long: 'short bound (the compiler understands that the long borrow may "flow into" the short borrow)

So it's functionally the same as the signature that uses elision.

The more general limitation is sometimes called an interprocedural conflict, where you're basically wishing for the ability to say "this method is mutable over some fields but immutable over others". There's no built-in feature to do this in Rust so far; &mut covers the entire struct.

DriedUrchin · April 10, 2022, 7:26am

Yes, exactly my thoughts after playing with it some more. I ended up working up the same lifetimes signature as you put there, and as you say, it doesn't help for the reasons that the distinct lifetimes are still related, and the fact that &mut borrows the whole struct.

It would be neat to be able to communicate a partial mutable borrow either by:

Some special receiver adapter, like a type you invent and implement PartialBorrow on.
The type of string being akin to final in Java, so that even though &mut self borrows the whole struct, it would understand that source is immutable.

At any rate, I ended up just storing a std::ops::Range in Token, and use that to extract the origin in the event of an error, since that's the only time I need the original Token, though the alternate solutions presented by you helpful folks are slick as well.

DriedUrchin · April 10, 2022, 7:32am

In fact, this is already (somewhat) possible with closures, they now capture only the fields they need right?

quinedot · April 10, 2022, 7:35am

The closure use case does indeed work in edition 2021 (but doesn't help with methods).

DriedUrchin · April 10, 2022, 7:36am

Yeah, still neat though. I had read a couple of that person's blog posts before, good stuff.

system · July 9, 2022, 7:37am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Lifetimes, Strings, slices help	16	418	April 16, 2021
Beginner question - cannot borrow as mutable more than once help	9	1477	August 14, 2020
Need help in resolving lifetime issue for borrowed slice help	4	188	November 26, 2023
Lifetime for slice in struct used in struct help	8	2087	December 1, 2019
Lifetimes and borrows help	6	368	July 11, 2022

Return slices into owned string

Related Topics