Struct owns String and slices of said String

So, the idea is simple.
I want a struct that owns a String and some fields/collections that point to specific parts of that String.

But somehow I'm not able to find a working solution.

struct Container<'a>{
  data:String,
  toc:Vec<&'a str>,
}

So how do I now implement a new() that is able to take ownership of a given string and saves some references to different parts of it like a table of contents for example.
Preferably computed in the new fn.
This shouldn't be a lifetime violation because both are part of the same struct.
Mutability might be an issue but the String is supposed to be unchanging.

But all I get is compiler yelling that either the lifetimes are wrong or can't move out because I borrowed stuff... which is the whole point of it.

I missing something, help please?

You basically tried to crate something called a self-referential struct, which is very hard to do in Rust. Tl;DR of why is that, is that Rust allows you to move containers around, which will memcpy it to a new location, after which pointers to self would be invalidated. And since Rust does not have move constuctors like C++, you cannot really in safe code do that.

So the short answer is that it is impossible (unless you really know your stuff) to create such struct. And if you want to understand deeper why it is so, and how to work around that look up phrase "self-referential structs in rust".

Thanks for the quick answer.
But in that case wouldn't it suffice to Pin the String?
Honestly I was trying to avoid that can of worms but I might have to.

Which is, of course, blatant violation of everything because self-referential structs are not supported in [safe] Rust.

You are depending in intimate details of implementation, on the fact that String keeps its data on heap.

And your implementation is actully unsafe because date would be destroyed before Vec which means in Container destructor you would always have a safety violation.

If you would swao data and toc… and use unsafe… then it can, probably, even work.

But better to use something like yoke, not write pile of unsafe code by hand.

The whole point of Pin is to ensure no one would be able to take reference of your data structure and move it anywhere.

That's the total opposite from what you want.

You may Pin your data structure, instead – but then you wouldn't be able to pass it around, wouldn't be able to return it or get an as argument, etc.

You are literally doing something that safe Rust was designed to prevent. Since you are skirting between something that safe Rust allows and something that would crash your program in runtime (and incredibly close to the edge) you are firmly in the realms of unsafe.

The only question is who would write that unsafe code: you, or someone else.

1 Like

Sort of, yes. You can "solve" self-referential problem with pinning, but this in turn will make Container immovable, which has its own set of problems. My recommendations without knowing greater context would be as follows:

  1. Try to remodel your data so it is no longer self seferential.
  2. If that would be to hard or inconvenient and your real use case still involves String and &str try storing pairs of indexes (usize, usize) instead of string slices. They will have the same size, and you can also construct and return string slices from them.
  3. Lastly if you really want to store references, then probably use some crate for it.

Here's how this can be changed to use ranges/indexes, which is a very common way in Rust (and in C/C++) for avoiding the problem that pointers are invalidated when data is moved.

use core::ops::Range;

struct Container {
  data:String,
  toc:Vec<Range<usize>>,
}

A Range can be easily converted to a &str slice when needed. Doing this when needed is idiomatic in Rust, because references are designed to be short-lived and not stored in structs.

For example, a function can return an iterator of the toc entries. The iterator borrows from &self, which ensures that the data cannot be modified while the iterator is in use.

impl Container {
    fn toc_entries(&self) -> impl Iterator<Item = &str> {
        self.toc.iter().map(|range| &self.data[range.start..range.end])
    }
}

Rust requires that if something is unchanging then you must express that to the compiler in a way it can understand and prove is safe. For example:

  1. Store the String separately from toc. Then toc will borrow from the String, and the compiler can prevent any changes to the String while this borrowed variable exists.
  2. If the String exists for the duration of the program you can leak it and then its lifetime becomes 'static. Then your struct becomes:
struct Container{
  data: &'static str,
  toc:Vec<&'static str>,
}
8 Likes

Thx the Range method is what I'm looking for.
I don't want to iterate stuff over and over again when the stuff is more or less static in its content.
It has the potential to change but that's seldom the case and the "toc" can easily rebuild then.

Honestly it didn't occur to me I could simply store a Range, kinda obvious but not without a change of perspective.
All I need is a location after all.

I was actually looking into leaking before I decided to ask here and it works but is a very ridged approach :rofl:

3 Likes

I believe you've found a solution, but I'll try to clear up a misconception.

Despite the unfortunate overlap in terminology, Rust lifetimes (those 'a things) are not[1] about how long a value is alive. Rust lifetimes are a compile-time type-level property about the duration of borrows.

struct Container<'a>{
    data:String,
    toc:Vec<&'a str>,
}

Here, you want the toc field to borrow the data field. Things which are borrowed (with Rust lifetimes) cannot run non-trivial destructors, cannot have a &mut taken to them, and cannot be moved. But you want to still be able to move your container (including the data field) around. So as it turns out, references -- with their compile-time checked borrowing mechanism -- is not the right tool for the job.

The language can technically create the self-referential struct, but to do so you have to borrow it forever, which is not practical -- you can never move it for example (as things which are borrowed cannot be moved).


  1. directly ↩︎

7 Likes

Thx, this actually cleared some confusion for me.


  1. directly ↩︎

2 Likes

I you (or anyone) want to get a deeper understanding and practice of handling self-referential like data structures, work on this excellent tutorial: Introduction - Learning Rust With Entirely Too Many Linked Lists

2 Likes

If the starting note is an indicator for the overall tone of it, it looks like it might be a very entertaining read.
Thx for bringing it to my attention.

Yes, indeed! It is very nice to read and very educational as well. Has been years since I worked it through, but that was the first “learning rust” tutorial which went behind the usual surface.