Hi there,
I'm new to rust. I'm currently working on one of my first few projects in getting used to the language.
For my current project I'm trying to implement a struct that contains a String and a list of fat pointers into that string (Vec<&str>). Intuitively, this seems reasonable and possible since the struct should own both the original String and the pointers to it, therefore making sure that the referenced values live as long as the entire struct does.
Is it actually possible? If so: How do I get the rust compiler to accept this structure?
My current attempt fails because the I can't move the String after borrowing:
I'll just throw out a couple notes that learning material tends to gloss over, in case it helps you understand why self-referencial structs are problematic and/or helps you avoid having to relearn some things with clearer version later in your Rust journey.
Most learning material tries to give the general gist of borrow checking and lifetimes, and doesn't go into the details. But one high-level fact that is useful for everyone to know is: Rust lifetimes (those '_ things) are about the duration of borrows, and not directly about the liveness scope of values. Those two are related -- it's an error for something that is still borrowed to get destructed or be moved, etc -- but they are not the same thing.
Borrow checking is about more than preventing dangling references. One of the other things it prevents is problematic aliasing -- that is, making sure that active &mut _s are exclusive -- no other outstanding references to the same data allowed. This is another reason why self-referencial structs are problematic: even if you manage to safely create one, the borrow checker cannot permit you to take a &mut _ to them (e.g. call a &mut self method).
Oh, I see, Thanks you! That does indeed help my understanding of lifetimes.
I actually did manage to implement this kind of struct in my code and aside from &mut _s being impossible, it does work. I simply opted to populate the struct in-place, making my intended data structure work:
Whether it's actually a good solution or even remotely practical is another question, but I gained understanding about how Rust works and that's definitely worth something
So if the compiler suggests it, it's probably steering you towards making some piece of self-referencial code compile, even though it's not practical overall.
Another bit of a consideration I don't see brought up that often, be it with regards to self-references or not, is about (1) the overall set of tools that are at your disposal, and (2) your general (in)ability and/or (un)willingness to go up and down the ladder of abstractions used.
Just because you - as a programmer / software engineer / solution architect / data model designer - happen to find a given approach (A) at the given level of abstraction (X) the most simple, intuitive, or "reasonable and possible"; doesn't mean there is no alternative approach (B) involving a different, (possibly) lower level of abstraction (Y) lying in wait for you to think of.
In your particular case:
A = a String + a list of fat &str pointers in a struct
X = text as a source, with a bunch of &str references pointing to it
B = a list of UTF-8 byte ranges, trivial to turn into Vec<&str>
Y = text as a source still, with a list of byte offsets - i.e. (usize, usize)
The A + X makes the most sense on the "higher" level - since your problem is about figuring out a way to reference the data in the String itself, not about individual bytes; yet it's only by stepping down to the "lower" bits and bolts of B + Y that you can find a way that wouldn't come with quite as many issues and warnings and problems and pitfalls and Pin<Box<T>>s and whatever else.
The higher the level of abstraction, furthermore - the higher the chance of you missing some detail upon which the whole "higher level" stands, to begin with. The & is not just a "reference". It's a pointer + lifetime + provenance + whatever else rustc attaches to it nowadays. Your source text's String is not just "text". It's a Vec<u8> which is itself a pointer + length + capacity, with the pointer that'll change/reallocate whenever you push() into it more than it can handle.
At times, the hardest part is noticing the simplest way forward. Something like this, perhaps.