Polymorphically own vs borrow for ergonomics?

In C++ once I was working on a library that let you chain various readers of bytes together, like FileReader -> GzipReader -> UserFormatReader. I noticed there was a tension between making less work for the user to setup and giving them maximum flexibility. Users wanted to write something like:

UserFormatReader::new(GzipReader::new(FileReader::new("foo.user.gz")))

However, this assumes that each layer owns the layer feeding it, which may not be desirable -- existing code may already give us access to a &mut FileReader for the file we want but not value access, or maybe the file has a part that is gzip and another part that is bz2, or maybe these readers need to be backed by different allocators. So other users may want to do something like:

let g = GzipReader::new(&mut existing_file_reader);
let u = UserFormatReader::new(&mut g);

In C++ I ended up writing a wrapper for readers that kept a pointer to a reader and a flag indicating whether the reader was "owned". Only if the flag was true on wrapper destruction I did delete wrapped_reader. This made both styles work. I was trying to think about what this would look like in Rust and came up with this:

enum ReferOrOwn<'a, T: 'a> {
    Refer(&'a mut T),
    Own(Box<T>),
}

The box is just to make ReferOrOwn<dyn Trait> work. Is this an existing pattern? I tried searching for "polymorphically owning or borrowing" but didn't come up with anything. Is it considered good practice in Rust? :sweat_smile:

In Rust io::Read is automatically implemented for &mut Read, so by requiring just Read you support both.

If in your code you want to unify owned and borrowed to definitely borrowed, use Read.by_ref().

Your enum is similar to std's std::borrow::Cow, although Cow requires Clone, which may be problematic for readers.

3 Likes

You linked to by_ref, so I assume you mean that? I'm confused by this function... it's identity. Why would calling identity be useful? Why wouldn't I just use the &mut self I must already have in order to call this function? It's a trait function, so hypothetically it could have a more complex implementation, but the signature requires returning &mut Self so I have no idea what a diff implementation would practically be? Secretly return a reference to a separate static instance for funsies? :sweat_smile:

Due to method resolution; you could have a mut Self and not a &mut Self...

- (&mut file).take(5).read_to_end(&mut buffer)?;
+ file.by_ref().take(5).read_to_end(&mut buffer)?;

...or something that DerefMuts into a Read-able type. So the main reason is ergonomic.

1 Like

Still not sure I understand. If I have a variable x: Foo directly bound to a value (not x: &Foo) I should still be able to call methods that take &mut Self. At least I've never run into the compiler objecting to this. Also I thought the nice aspect of Deref is if something Deref<T=Read> then it should find take anyway? It seems like I should just be able to write:

file.take(5).read_to_end(&mut buffer)?;

???

The reason you want a &mut my_reader here is because take consumes self, so if your reader does not implement Copy, you can no longer use it.

fn by_value_fails<R: Read>(mut reader: R) -> io::Result<()> {
    let mut buffer = Vec::new();
    let mut other_buffer = Vec::new();

    // Here we lose ownership of `reader`
    reader.take(5).read_to_end(&mut buffer)?;
    // And this fails
    // error[E0382]: borrow of moved value: `reader`
    reader.read_to_end(&mut other_buffer)?;

    Ok(())
}

Instead you can take a temporarly exclusive borrow:

-    reader.take(5).read_to_end(&mut buffer)?;
+    (&mut reader).take(5).read_to_end(&mut buffer)?;

Or more ergonomically

-    reader.take(5).read_to_end(&mut buffer)?;
+    reader.by_ref().take(5).read_to_end(&mut buffer)?;

This can be useful for any Read-type wrapper; sometimes a method is provided to return the inner reader, but sometimes one is not provided.

There are similar patterns for iterators, where again

  • Wrapping combinators are common
  • But they don't always provide a method to get the original back out
  • However, &mut iterator acts like iterator, so you can instead use a combinator on a temporary borrow
2 Likes

I'm getting more confused :sweat_smile:

  • The language provides Deref and DerefMut so that smart pointers to T can boil down to &T without explicit dereferencing.
  • Given a t: &T, you can call a method t.foo() that takes &self
  • Given a t: T you can call a method t.foo(), because impl<T> Deref<T=T> for T {} is implemented for us by the compiler (a type trivially derefs to itself)
  • But for Read this is not flexible enough for storing nested readers -- we want to be able to store a nested reader directly or by reference, so Read has an impl for any &T where T: Read. This way a generic reader can store some U: Read and U could be a reference or a direct value. If we didn't have the impl of Read for references, then a generic type storing a &T where T: Read would get an error trying to call Read methods...? But it wouldn't, that's what Deref gave us already so I'm not sure what the Read impl for references is helping with.
  • People want to destructively consume their IO reading/writing objects as they iterate/stream through them. 99% of the time I want my container to persist after iterating over it so going out of the way to support this seems super weird coming from C++ but I guess it helps with moving things directly out of the stream without cloning... except that Read is operating on bytes so there is no difference between shallow and deep copying so why do this?

Yes.

No; for example, u32 does not implement Deref, and Vec<T> implements Deref<Target = [T]> (and you can only have one implementation of Deref).

Given t: T and T::foo(&self), you can call t.foo() due to method call resolution. In particular, note that before attempting Deref coercion, all of T, &T, and &mut T are considered.

For the purposes of this discussion: If you have a t: T and a T::foo(&self), but no T::foo(self), then t.foo() will call T::foo(&self).

(Read is implemented for &mut R where R: Read, not &R.)

The implementation for &mut R is just overall more flexible, and not only for method resolution. Let's try another example:

struct WithReader<R> {
    reader: R,
}

impl<R: Read> WithReader<R> {
    fn read_some(&mut self, limit: usize) -> io::Result<Vec<u8>> {
        let mut buffer = Vec::with_capacity(limit);
        self.reader.take(10).read_to_end(&mut buffer)?;
        Ok(buffer)
    }
}

This fails:

error[E0507]: cannot move out of `self.reader` which is behind a mutable reference
  --> src/lib.rs:10:9
   |
10 |         self.reader.take(10).read_to_end(&mut buffer)?;
   |         ^^^^^^^^^^^ move occurs because `self.reader` has type `R`, which does not implement the `Copy` trait

Ah, okay -- take is trying to take ownership of our reader, leaving our struct in an invalid state. Let's pretend &mut R didn't implement Read. What would our choices be? A couple are...

But &mut R does implement Read, so instead we can do:

-        self.reader.take(10).read_to_end(&mut buffer)?;
+        self.reader.by_ref().take(10).read_to_end(&mut buffer)?;

So why not just store a &mut R yourself; why does take take ownership? For example.

The problem is that the struct can now only contain some sort of borrow; the thing it is borrowing must live elsewhere, and this struct is now always burdened with a lifetime. It's not possible to [1] send this borrowing structure to another [2] thread, for example, or use it anywhere else you need to satisfy a 'static bound. You also couldn't return the reader up the call stack, or move the borrowed inner reader at all before the nesting reader is dropped.

However, if you just store the R, you can handle both the borrowed and owned cases (due to &mut R also implementing Read). That is, a consumer of our struct with a U: Read can use a WithReader<U> or a WithReader<&mut U>, according to their needs.

I'm not sure I understand this part, but perhaps the last section addresses it -- storing an R can cover both the owned case and the borrowed case, while storing a &mut R can only cover the borrowed case. If you give up ownership and don't get it back somehow, you won't be able to use it again (presumably it is destructed), sure -- but there's a lot more differences between ownership and borrowing than that, like lifetime considerations and the inability to move a borrowed structure.


  1. (safely) ↩︎

  2. non-scoped ↩︎

2 Likes