Struct containing Pathbuf and &Path referencing the PathBuf

Hello!

I want to return a struct with two paths, the original one, and the original one processed to have a prefix stripped. Here is some non-functional code:

pub struct FileData<'a>
{
    orig_path : PathBuf,
    path      : &'a Path,
}

fn f(path: &Path) -> FileData
{
    return FileData {
        orig_path : path.to_path_buf(),

        // Apparently the following does not work
        path   : this_struct.orig_path.strip_prefix(whatever).unwrap(),
    };
}

I wouldn't like to allocate twice, and store two PathBuf fields. But I would also like to avoid other overhead like reference counting. So:

  • How can I reference the PathBuf in the same struct (i.e. what's the right way to do what I have as this_struct in the code above)?
  • How can I tell the compiler that the lifetime of path is the same as the lifetime of orig_path? This should solve any ownership problems.

Thank you in advance!

It's not possible to have a struct that references anything inside of itself.

This is unsafe, because the borrow checker tracks one field at a time. If someone replaced orig_path field with another object, the path would be left invalid. The borrow checker doesn't have power to do such cross-field checks.

Usually this is worked around by storing a numeric index/offset into the other field. For PathBuf that's a bit tricky, because it won't let you do direct indexing.

  • The easiest workaround is to take efficiency hit and store two PathBufs.

  • You could also store orig_path outside (e.g. in a memory pool/arena), and hold both by reference, although this would make the object tied to a scope, and could be annoying.

  • If you can convert the path to String/str (or use unix-only as_bytes()), then you can store length of the stripped prefix, and get the shorter path with Path::new(orig_path.get(self.path_prefix_len..).unwrap()).

  • There's owning_ref — Rust library // Lib.rs and Rental — Rust library // Lib.rs that add some support for self-referential structs

1 Like

Thanks @kornel! I've been banging my head against the wall to figure out how to do something that's fully natural in my mind, and it seems there is no way. I'll take the hit of the double PathBuf for now, but I'll keep a TODO note since it's part of my main loop, and may happen thousands of times per second.

If someone replaced orig_path field with another object, the path would be left invalid.

Hmmm. What if this part of the struct was defined as const, wouldn't that provide assurance to the borrow checker?

Additionally what if I could give a lifetime to the PathBuf to be equal of that of the Path reference? Wouldn't that also provide the same assurance?

The problems with self-referential structs go way deeper than mutability. The biggest issue is that moving the struct invalidates the reference, even if nobody mutates the field being referenced. So a const field (all on its own) wouldn't help at all.

The short answer is that lifetimes aren't the only issue. Lifetimes mostly refer to how long a value will continue to exist without getting drop()ed and deallocated, but again, even moving is enough to break these types.

So what you really need is a guarantee that you won't move (or drop) the struct. At a very high level, std::pin - Rust and (as @kornel said) owning_ref — Rust library // Lib.rs and Rental — Rust library // Lib.rs could be considered different ways of providing some sort of "I won't move it again" guarantee.

However, the broader answer remains: any direct solution to this introduces a lot of complexity and subtlety, so for the vast majority of use cases you're better off just not doing self-references and using one of the alternatives @kornel suggested earlier.

1 Like

Even with const there would be a case of structs that depend on their own address:

struct Foo {
   const v: i32,
   r: &i32,
}

the reference would still be invalid if the struct was moved to another address. Solving this would require unmovable types. You can search graveyard of Rust proposals for this feature.

PathBuf has a separate heap allocation, so if the borrow checker could be explained the difference, it could allow this. Pin tries to do something like this, but if you see the docs, you'll see that handling edge cases of this makes it complex.

Thank you both. I'm still trying to understand the issues. In the meantime, I wonder about the solution that @kornel mentioned for strings, storing the index.

Wouldn't this be innefficient because every time we do the get(), it has to parse the UTF-8 string to find the specific character index? (EDIT apparently if I avoid "as_bytes()" since paths can be in all kinds of languages)

It doesn't parse the whole string, just checks the boundary (UTF-8 is clever like that). You can also use get_unchecked.

Path::new() doesn't check anything.

Ah I see now, slicing a string refers to byte indices, not character indices. But still s.get(3) has to verify that the 4th byte is not in the middle of a multi-byte character sequence, so it should parse the string. get_unchecked() seems overhead-free, indeed.

Yeah, the check only needs to look at 1 byte.

Interesting, I was not aware of it! :+1:

Thanks guys, I faced the same problem once again by trying to reference a Vec in the same struct, this time the solution was easier, I just stored the index. For the PathBuf specifically I took the performance hit and stored two of them, since the other solutions all had overhead too, or were too complicated.

I understand why it's so difficult to implement: because of the need of unmovable variables (thus the need for overcomplicated solutions like pin or owning_ref). I hope a no-overhead solution gets supported in the language at some point, since this is a very natural thing to do (like splitting an existing PathBuf).

I'll mark the topic as solved, thanks for all the help!

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.