Implement a stack of path component with a shared buffer

Hello,

I'm currently trying to implement a weird data structure : a stack of path components (that can be popped and pushed from/at the end). What's special is that instead of storing the components in a vector-like structure, I'd like to use the call stack. Basically, I'd have a backing String somewhere on the call stack, and each call frame on top of it would have a reference to it (inside a struct) with the number of characters of the current path component of the stack frame, so the String can be truncated of its last component when the struct is dropped.

Just for reference, these are not file system paths, so I can use Strings without feeling bad about compatibility.

pub struct Path<'a> {
    string: &'a mut String,
    last_component_size: usize,
}

impl<'a> Path<'a> {
    pub fn new(backing: &'a mut String) -> Self {
        Self {
            string: backing,
            last_component_size: 0,
        }
    }

    pub fn join<'b: 'a, 'c>(&'b mut self, component: &'c str) -> Path<'b> {
        const SEP: char = '/';

        self.string.push(SEP);
        self.string.push_str(component);

        Self {
            string: self.string,
            last_component_size: SEP.len_utf8() + component.len(),
        }
    }
}

impl<'a> Drop for Path<'a> {
    fn drop(&mut self) {
        let new_len = self.string.len() - self.last_component_size;
        self.string.truncate(new_len);
    }
}

However, I cannot use this piece of code as I would like to. For instance, this doesn't compile:

#[test]
fn test_call_rec() {
    let mut backing = String::with_capacity(4);
    rec(Path::new(&mut backing), 0);
}

fn rec(mut path: Path, n: u8) {
    match n {
        0 => {
            rec(path.join("1"), 1);
            rec(path.join("2"), 2);
        }
        1 => {
            rec(path.join("2"), 2);
            rec(path.join("2"), 2);
            rec(path.join("2"), 2);
        }
        _ => {
            dbg!(path.string);
        }
    }
}

for these reasons (only one error shown):

error[E0597]: `path` does not live long enough
  --> back/src/form_parser/path.rs:56:21
   |
53 |     fn rec(mut path: Path, n: u8) {
   |            -------- has type `path::Path<'1>`
...
56 |                 rec(path.join("1"), 1);
   |                     ^^^^^^^^^^^^^^
   |                     |
   |                     borrowed value does not live long enough
   |                     argument requires that `path` is borrowed for `'1`
...
68 |     }
   |     - `path` dropped here while still borrowed

I would have guessed that Path::join would create a new path for the duration of the function call (capturing exclusive mutability to the self path) and drop it just after the function call. But it doesn't seem to be the case. Why is path still borrowed ?

I have quite a bit of experience in Rust but I'm really lost here. I have no idea of which safety guarantees I'm unknowingly trying to break.

I'm not an expert on the topic myself so if someone has more to add or something to correct, please do!

'b: 'a means 'b outlives 'a, or 'b >= 'a, so when you borrow path for 'b, it's borrowed for at least as long as 'a. This also makes &'b mut self essentially the same as &'a mut Self<'a> which is an antipattern that borrows self for its entire existence.

If we write out the signature to rec it'd look like fn rec<'x>(mut path: Path<'x>, n: u8). When we call path.join inside rec we promise we're giving join access to path for at least 'b: 'x, where 'x is a lifetime parameter coming from outside rec and therefore lasting longer than the function body. But at the end of rec path is dropped before 'x is over and and thus borrowed value does not live long enough.

The "fix" to the specific compiler error posted is to remove the bound, but that causes another error (and the compiler's suggested fix is to put the bound back...). The real root of the issue is variance. In short, you're not allowed to treat a &'a mut T as some shorter &'b mut T, like you could with &'a T. In this case, you're trying to turn the &'a mut String from self into a &'b mut String of the return value.

The issue is it would allow something like

fn main() {
    // 'a
    let mut string = String::from("foo");
    let mut path = Path::new(&mut string);
    {
        // 'b
        let b = String::from("bar");
        let path_b = path.join("component");
        *path_b.string = b; // this should not be valid because `path_b.string` is really &'a mut String and not &'b mut String
    }
    println!("{} points to b, which has been dropped...", path.string);
}

Here's a playground where I removed the bound and scammed the compiler into accepting the lifetime-shortening with string: unsafe { std::mem::transmute(&mut self.string) }, resulting in a crash when you try to run it: Rust Playground

I think your key mistake was here:

        Self {
            string: self.string,
            last_component_size: SEP.len_utf8() + component.len(),
        }

As I'm guessing the errors you get when you have no lifetimes on join led you to add the lifetime constraints that were on join. And the problem is that Self here is an alias for Path<'a> -- but what you really want is to return Path<'something_shorter_than_a>.

This works:

    pub fn join(&mut self, component: &str) -> Path<'_> {
        // ...
        Path {  // <-- *not* Self
            string: self.string,
            last_component_size: SEP.len_utf8() + component.len(),
        }
    }

Here, the anonymous lifetime on &mut self is also the lifetime of the returned path, and it's allowed to be shorter than 'a.


As for an explanation of what was going wrong, consider the lifetime constraints:

    pub fn join<'b: 'a, 'c>(&'b mut self, component: &'c str) -> Path<'b> {

Here, Self is Path<'a>, and the &'b mut self means that 'a: 'b. Then you have the added constraint that 'b: 'a. Together, this means that 'b == 'a -- and indeed, this is necessary to return Self (but again, that's not what you actually wanted to do). 'c has no constraints on it, so we can rewrite the above as

    pub fn join(&'a mut self /* Path<'a>*/, component: &str) -> Path<'a> {

And this is going to be problematic because &'a mut Thing<'a> is an anti-pattern -- it means "borrow the Thing for the entirety of its remaining valid lifetime". Once you do this, you can never use the Thing again, except maybe via the return value of the function.

You can't even drop it -- and since Path has a drop implementation, this means you can never successfully call a method with that signature at all.

3 Likes

You are allowed to treat a &'a mut T as some shorter &'b mut T like you can with &'a T -- &'a mut T is invariant in T, but still covariant in 'a.

So you can reborrow &'a mut Thing<'a> as a shorter &'b mut Thing<'a>, but you can't turn it into a &mut Thing<'b>.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.