Disagreement with the borrow check about `Cow`

Here's some code that I think the compiler "should" accept:

use std::borrow::Cow;

pub struct StreamDecoder {}

impl StreamDecoder {
    pub fn read(&mut self, _n: usize) -> &[u8] {
        unimplemented!()
    }

    pub fn read_exact(&mut self, n: usize) -> Cow<'_, [u8]> {
        let mut out = Cow::Borrowed(self.read(n));
        while out.len() < n {
            let rem = n - out.len();
            out.to_mut().extend_from_slice(self.read(rem));
        }
        out
    }
}

The idea is that read returns a slice of an internal buffer, which is refilled only if it is empty at the time of the call. The length of the returned slice will be at most n, and will be less than n if the buffer (after possible refilling) has fewer than n bytes remaining. The point of read_exact is that it calls read internally until it has exactly n bytes, returning a Cow::Borrowed if this can be done with one call to read (the common case, hopefully) or a Cow::Owned if more than one call is required.

The compiler doesn't allow this implementation for read_exact:

error[E0499]: cannot borrow `*self` as mutable more than once at a time
  --> src/lib.rs:14:44
   |
10 |     pub fn read_exact(&mut self, n: usize) -> Cow<'_, [u8]> {
   |                       - let's call the lifetime of this reference `'1`
11 |         let mut out = Cow::Borrowed(self.read(n));
   |                                     ---- first mutable borrow occurs here
...
14 |             out.to_mut().extend_from_slice(self.read(rem));
   |                                            ^^^^ second mutable borrow occurs here
15 |         }
16 |         out
   |         --- returning this value requires that `*self` is borrowed for `'1`

(playground)

I think I understand what it's getting at, but the error seems like a false positive: the first mutable borrow of *self can never overlap with the second, because out.to_mut() turns out into a Cow::Owned, which ends the first borrow. Is that reasoning incorrect? If not, what's a good way to work around the issue?

I'm not sure if this works with any other code you may have, but one solution is to give a separate lifetime to the underlying data so that you're not tying the lifetime of the borrow of the decoder to the resulting data.

use std::borrow::Cow;
use std::marker::PhantomData;

pub struct StreamDecoder<'data> {
    _data: PhantomData<&'data ()>
}
impl <'data> StreamDecoder<'data> {
    pub fn read(&mut self, _n: usize) -> &'data [u8] {
        unimplemented!()
    }

    pub fn read_exact(&mut self, n: usize) -> Cow<'data, [u8]> {
        let mut out = Cow::Borrowed(self.read(n));
        while out.len() < n {
            let rem = n - out.len();
            out.to_mut().extend_from_slice(self.read(rem));
        }
        out
    }
}

Playground

Yeah, your lifteimes are too constrained. This is enough to fix the OP:

    pub fn read<'short>(&mut self, _n: usize) -> &'short [u8] {
        unimplemented!()
    }

But you may benefit elsewhere from doing the similar to read_exact.

1 Like

The internal buffer is a Vec<u8> that's owned by StreamDecoder, so I don't think this works -- the borrow returned from read really is tied to the borrow of self (and I'd like to keep that design if possible).

I'm far from an expert but I think you'd need unsafe then to effectively return the first borrow as far as the compiler is concerned.

    pub fn read_exact(&mut self, n: usize) -> Cow<'_, [u8]> {
        let out = self.read(n);
        let out = unsafe {std::slice::from_raw_parts(out.as_ptr(),out.len())};
        let mut out = Cow::Borrowed(out);
        while out.len() < n {
            let rem = n - out.len();
            out.to_mut().extend_from_slice(self.read(rem));
        }
        out
    }

Edit:
Actually nevermind, what @quinedot had will do the same thing for read_exact since it has the same type signature. If you want to keep the read type signature the same, then just make a non public read method that has the signature that @quinedot has
Playground

1 Like

Well, here's a step by step of what I understand is happening (I expect to be corrected if I get this wrong...):

  • You call read_exact(&'1 mut self, _) -> Cow<'1, [u8]>
  • You assign to out which is (always) returned, and thus, it is a Cow<'1, [u8]>
    • To get the value to assign you call read(&'1 mut self, _) -> &'1 [u8]
    • The exclusive borrow of self must thus last for '1, past the body of read_exact
  • So it's already game over, you can't create any more borrows of self

The compiler would need to look inside to_mut to know that it creates a "fresh lifetime". But even if the compiler was allowed to look inside of functions, as I understand it, lifetimes of bindings cannot change once resolved. (There are no "fresh lifetimes". Modulo a new binding or reborrow or similar.)

Here's a version that maintains your signatures and splits the borrowed-or-owned paths into separate branches. It... still doesn't compile. But it does compile with Polonius (and the OP does not).

At this point I must admit my mental model doesn't quite capture what is going on with Polonius. My guess is that Polonius can see that the two branches can have different lifetimes and is fine with that, because they both resolve over the entirely of the function. (Note that different outs are returned in the different branches.) But it could also be my mental model is just off. Hopefully someone else in the forum can break it down further.

3 Likes

Just to add one point to @quinedot, the reason his works I think is that the read signature becomes more flexible in that it can borrow the data for shorter than the borrow of self if it wants (which is what is happening when the owned path is taken in your code) but I don't think its required to be shorter in this case (which is what is happening when it just returns the original borrow). I think just leaving @quinedot's original signature is always more flexible than hiding it behind _read like I did since it allows the data to potentially be borrowed for a shorter lifetime than self if that is useful for the caller. At least that's my understanding of that part of it.

Edit:
This is wrong see below

Thanks! Your explanation makes sense and is (a more precise version of) what I had in mind when I said I thought I could understood what the compiler was getting at.

Indeed I had something like this in my original code -- returning a Cow::Borrowed from one branch and a Cow::Owned from the other -- until I realized I could write it more concisely as in the OP. I thought I'd tried compiling both versions with -Z polonius to no avail, but apparently I wasn't operating Compiler Explorer correctly.

Anyway, since polonius accepts it I'm satisfied that there isn't a lurking soundness issue, so I guess the solution is to use a bit of unsafe as in @drewkett's post.

If you look at my playground link you can accomplish it without unsafe. I should have made that more clear when I did that edit. But I think adding the lifetime directly to read though is just a more flexible version of your original definition and saves needing to create the private _read function.

Edit:
This is wrong see below

I circled back to this because I couldn't understand why it works, and... it doesn't work. Unless you return a &'static. The real less-constrained version for self-borrowing is:

pub fn read<'a, 'b: 'a>(&'b mut self, _n: usize) -> &'a [u8] {
    todo!()
}

And since the output lifetime of read ('a) is still tied to the lifetime of out which is the lifetime you return from read_exact ("'1"), and because 'b must outlive that, the situation is really no different.

(So yeah, Polonius or carefully fake it until then I guess...)

1 Like

But I couldn't implement your _read or @quinedot's read without unsafe, because the returned value really does borrow from self and hence safe code cannot put an arbitrary/unconstrained lifetime in that position.

My bad. It compiled with unimplemented and I assumed it would be fine. I had thought it shouldn’t work so that’s good to know.

1 Like

I probably should've written

            let out = out.to_mut();
            out.extend_from_slice(self.read(rem));

instead of

            out.to_mut().extend_from_slice(self.read(rem));

because the evaluation order of the original line is not completely obvious, and if self.read(rem) were evaluated before out.to_mut() then there really would be illegal aliasing.

1 Like

Thought of one other way to do this without unsafe. If you can make the determination whether you'll be able to return borrowed data ahead of time, you can do something like this. Basically what @quinedot had with the split paths but using separate calls to read to avoid lifetime issues. Obviously this only works if can_read_exact is definable.

impl StreamDecoder {
    pub fn can_read_exact(&mut self, _n: usize) -> bool {
        unimplemented!()
    }
    pub fn read(&mut self, _n: usize) -> &[u8] {
        unimplemented!()
    }

    pub fn read_exact(&mut self, n: usize) -> Cow<'_, [u8]> {
        if self.can_read_exact(n) {
            Cow::Borrowed(self.read(n))
        } else {
            let mut out = self.read(n).to_owned();
            while out.len() < n {
                let rem = n - out.len();
                out.extend_from_slice(self.read(rem));
            }
            Cow::Owned(out)
        }
    }
}

Playground

3 Likes

And for the sake of minor optimizations

    pub fn read_exact(&mut self, n: usize) -> Cow<'_, [u8]> {
        if self.can_read_exact(n) {
            Cow::Borrowed(self.read(n))
        } else {
            let mut out = Vec::with_capacity(n);
            while out.len() < n {
                let rem = n - out.len();
                out.extend_from_slice(self.read(rem));
            }
            Cow::Owned(out)
        }
    }
1 Like

Oh, nice! I have to check whether something like this will work in context but it looks promising.

Can you show us how you want to use this function? That usually gives away the intention way better w.r.t. lifetimes.

@drewkett this did the trick! My working implementation looks like this:

        pub fn read_exact(&mut self, n: usize) -> Cow<'_, [u8]> {
            self.refill_if_empty();

            if self.have(n) {
                Cow::Borrowed(self.read(n))
            } else {
                let mut out = Vec::with_capacity(n);
                while out.len() < n {
                    let rem = n - out.len();
                    out.extend_from_slice(self.read(rem));
                }
                Cow::Owned(out)
            }
        }

Where self.have(n) is the analogue of your self.can_read_exact(n). And no unsafe in sight :‌)

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.