Storing string in a mutable struct with same lifetime

Hi all,

I'm currently working on a parser, and have gotten a bit stuck on reading in other files. In particular, I want a function to read in a file, operate on the contents, and return something with &str references into the file. Since the file names aren't known ahead of time (meaning it can't be read outside the function), my thinking is to store the contents of the file in an external struct, so that the data can be kept alive longer than the function lifetime; something like this:

pub struct MyStruct<'a> {
    // ...
}

fn my_function<'a>(
    storage: &mut MyStruct<'a>
) -> &'a str {
    let contents: String = read_file();
    let content_ref: &'a str = storage.retain(contents);
    content_ref // References the data from `contents`, stored in `storage`
}

At a high-level, I think something like this should be possible; with the 'a lifetime on MyStruct, it should be able to keep data around for at least that long, and provide a reference to it for that lifetime. However, I haven't been able to figure out how to implement this in practice. Using Vec<String> (or elsa::FrozenVec<Box<String>>, which I thought might work) seem to require &'a mut MyStruct<'a>, which isn't doable since I need the mutable reference elsewhere as well. Not sure if this is doable outside of something like Box::leak?

(Still consider myself a Rust newbie, so pardon any obvious errors :slight_smile:)

2 Likes

The obvious error is here:

Lifetimes, in Rust, don't track the validity of data. They track the validty of place where data lives.

Which means that self-referential data couldn't be returned from function. It can be constructed, but not returned.

At least not in safe Rust.

If you want to [ab]use unsafe Rust then current goto crate fore self-referential structs is ouroboros… but 99% of time the best solution is simply not to do that.

2 Likes

Unless you have very specific requirements, return a u8 vector which handles allocation of memory for you.

fn my_function(filename: &str) -> io::Result<Vec<u8>> {
    let mut file = File::open(filename)?;
    let mut buffer = Vec::new();
    file.read_to_end(&mut buffer)?;
    Ok(buffer)
}

Later, you can view the buffer as &str using "std::str::from_utf8".

3 Likes

That's the other solution I was imagining; the only annoyance is that this particular function is deep within a recursive call stack, so passing/accumulating the result all the way up seemed like it might not be the best way to go (though maybe the only way - thanks for the suggestion!)

You can store the buffer inside the storage and have a method allowing to borrow from it, like fn get_str(&self) -> &str

2 Likes

Despite the name, Rust lifetimes -- those 'a things -- are generally about the duration of borrows. They are not (directly) about how long how long the owned data is around. A MyStruct<'a> indicates that MyStruct<'a> contains a borrow which something else owns. Otherwise, as others have said, MyStruct<'a> would be self-referential -- borrowing itself -- and that's not a pattern that safe Rust supports in a useful way. The references we have are the wrong tool for such a job.

Since MyStruct owns the data, it shouldn't have a lifetime -- a borrow duration -- as part of its type. Instead do something like:

pub struct MyStruct {
    // ...
}

impl MyStruct {
    fn retain(&mut self, more_data: String) -> &'a str {
        // ...
    }
}

fn my_function(storage: &mut MyStruct)) -> &'a str {
    let contents: String = read_file();
    storage.retain(contents)
}

If the above doesn't work because you need a &mut while holding the returned reference, consider what the mutable reference allows you to do: append more data or replace the data, potentially causing reallocation; completely replace MyStruct by overwriting it will a new one; etc. Actions that would invalidate the returned reference. (Beyond that, &mut _ is an exclusive reference, so even providing a way to create a &mut _ to the same data as the returned reference is unsound.)

So if that's the case, you'll need something with shared mutability, allowing you to append data and do other operations with only a shared reference (&MyStruct) instead. Or refactor things so that you don't need to hold on to the returned reference over exclusive (&mut) operations.

Rust has no language-level garbage collection, and references can't dynamically keep data alive -- Rust lifetimes are erased during compilation and the borrow checker cannot change the semantics of any compiling program, e.g. by making a destructor run at a difference place.

So if you create owned data deep in the call stack that you need to last until much higher in the stack, you do need to pass it up the stack. Leaking isn't really any better (you'll just be passing a reference to the leaked data up the stack instead). An alternative is to thread down a &mut Owner or Arc<Mutex<Owner>> or whatever so that something else up the stack owns the data. Which probably doesn't look too different than your imagined &mut MyStruct<'a> would have, if it was actually a viable option.

3 Likes

The short from: Rust's lifetimes are descriptions, and not prescriptions.

IOW: you are not “directing the compiler” with these sigils to create a correct program, it's the other way around: your program have to be valid before you'll start with lifetime markup and said markup would have to give convincing answer to the question ”why is it valid?”… convincing enough for the compiler to be happy.

If you have hard time understanding why your program is valid then your certainly would have hard time convincing the compiler!

3 Likes

In this example the 'a lifetime has nothing to borrow from, so there's no way to implement this function without cheats like Box::leak.

Perhaps you intended to have retain take a &'a mut self, in which case it would work, however I suspect OP wants to calls this method multiple times and still hold onto the references returned by earlier calls. If that's the actual goal then they will have to make it take a &'a self, and internally use internal mutability (since they mentioned elsa::vec::FrozenVec, that should work if they use push_get to get back the reference to return)

1 Like

Apart from the excellent in-depth explanations from everyone on the thread, I would just get rid of the lifetimes as suggested by @quinedot I know that something like following with elsa::FrozenVec should work for your use case, because I wrote something similar while writing an interpreter last year :'D

pub struct MyStruct {
    contents: FrozenVec<Box<str>>,
}

fn my_function(
    storage: &MyStruct
) -> &str {
    let contents = read_file();
    storage.contents.push_get(contents.into_boxed_str())
}

EDIT: Also, I think you should consider rewriting using bumpalo/arena allocation + memmap2 if you are trying to write something high performance (which was also what I eventually tried to do later on)

1 Like