Mocking File, confusion with generic

I'm trying to mock a std::fs::File for a data structure so I can either use it in memory or write it to disk. I ran into a strange problem. I can use a mut &File to read or write, but I can't do the same thing if the function uses a generic. The motivation here is that the "file" itself is read only, stored in a struct, and I don't want to take a &mut self reference

Here's the simplest demonstration. As far as I can tell, these two functions should behave exactly the same. Why does func1 compile and func2 doesn't?

use std::fs::File;
use std::io::Read;

fn func1(f: File) {
    let mut ref_f = &f;
    let mut contents = vec![];
    ref_f.read_to_end(&mut contents);
}

trait FileLike: Read {}
impl<T> FileLike for T where T: Read {}

fn func2<F: FileLike>(f: F) {
    let mut ref_f = &f;
    let mut contents = vec![];
    ref_f.read_to_end(&mut contents);
}

outputs:

   Compiling playground v0.0.1 (/playground)
warning: variable does not need to be mutable
  --> src/lib.rs:15:9
   |
15 |     let mut ref_f = &f;
   |         ----^^^^^
   |         |
   |         help: remove this `mut`
   |
   = note: `#[warn(unused_mut)]` on by default

error[E0596]: cannot borrow `*ref_f` as mutable, as it is behind a `&` reference
  --> src/lib.rs:17:5
   |
15 |     let mut ref_f = &f;
   |                     -- help: consider changing this to be a mutable reference: `&mut f`
16 |     let mut contents = vec![];
17 |     ref_f.read_to_end(&mut contents);
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `ref_f` is a `&` reference, so the data it refers to cannot be borrowed as mutable

For more information about this error, try `rustc --explain E0596`.
warning: `playground` (lib) generated 1 warning
error: could not compile `playground` due to previous error; 1 warning emitted

The Read trait uses methods which take &mut, but the File struct implements Read for both &File and File, so it is possible to use Read methods even if you only have a shared reference because you end up using &mut &File in the actual method call, which works.

In your generic version, it just has some &T where T implements Read, but it isn't known if &T (ref_f) implements Read as well. Instead, I would probably implement it like:

fn func2<F: FileLike>(mut f: F) {
    let ref_f = &mut f;
    let mut contents = vec![];
    ref_f.read_to_end(&mut contents);
}
3 Likes

Thanks for your response. That makes sense. The FileLike object is stored in a struct. How can I do this without running into the "move out of shared reference" error?

You might need to provide a more complete example, but in that case func2() sounds like it should take a &mut of FileLike in the first place rather than taking ownership.

1 Like

What I'm trying to build is a concurrently-readable (read only) data structure backed either by a file or by memory. To make that work, I don't want to take a &mut self reference, as that would require locking and make concurrent reads impossible. The file-backed structure needs to be lazy-loading, as the files are potentially huge. I wanted to let the OS handle paging, which led me to the File-based interface. I'm also using serde to deserialize_from variably-sized objects from different points in the file, so I need the Read support.

I suppose I could do this with something like Arc<Mmap>, but then to maintain the serde integration, I would need to implement a mutable cursor and Read+Seek on top of it, which seems silly. Alternatively I could open up a new File for every concurrent reader, but that doesn't feel right either.

You won't be able to use a single File (or any Read + Seek) for concurrent reads anyway, because they will be sharing one cursor — you'd need locking to avoid conflicts over setting the position for the next read. So, you'll need an explicit cursor (with the File-backed implementation having a mutex which protects each {seek(); read();} pair) or to have one File per reading thread.

I would suggest defining your own trait that doesn't try to be File-like, something like:

trait ConcurrentRandomAccess {
    fn read(&self, position: u64, buf: &mut [u8]) -> io::Result<()>;
}

and then do whatever is necessary inside to implement it for a File and for an in-memory buffer.

2 Likes

If it weren't concurrent I'd say RefCell; but with concurrent; welcome to pain. As kpreid said, Rust is doing it's job in preventing you from potentially shooting yourself in the foot.. It's an anti pattern in ANY language to have concurrent access to an os file-handle (without a mutex).. Something can muck with invarients between calls.. My most hated is Seek(-1); Tell(); to specify the file-length. That's the only cross platform way to do it, yet obviously is a massive race condition. Linux has a lot of thread-safe ways to do it at a lower level, but linux isn't the only OS. :frowning:

Option 1 (as in another thread) is to dup the file handle.

Option 2 is to just open/close the File as needed (no sharing necessary - the OS will properly cache everything).. This is actually better on linux anyway (without root); because you only have about 800 available file handles for 99.9% of processes; and code that assumes it can cache file handles will produce nasty crashes (ask me how I know this). On windows this isn't a problem.

Option 3 is to guard access to the file in a clever way (like a database). Either through ReadWrite locks or a dedicated actor thread. This is the most effort on your part and may not be measurably faster compared to option 2. YMMV. Think of it this way.. Either YOU do the synchronization or the OS does. :slight_smile:

Option 4: If you're on linux look into glommio or nix library; and leave the std library behind. There are definitely high performance IO capabilities (async IO events), but again rust is least-common-denominator here.

Option 4a: I don't think you can do byte-range reads with mio / tokio; but you can look into there if possible; when I last looked, it was no different than just reading the whole file in at once; so I didn't bother.

Thanks for the detail, this is really helpful. I wasn't aware of the limits on open file handles. I might've run smack into that one. Option 3 with RwLocks or a manager thread was the direction I was headed, but if there's limited benefit over just open/read/close, i don't know if it's worth it.

You can up the file handle count with limit/ulimit, but by default each proc gets 1024 file handles(including shared libraries and sockets) . There is a soft and hard limit. You have to be root to up the hard limit - generally considered only good to use on databases / NAS and as a system /etc/limits.d file configuration.