Writing a filesystem crate without regrets

Hi all,

After my Windows registry parsing crate, I'm currently writing an NTFS filesystem crate - read-only for now but open for write support later.
It centers around a FileSystem struct that holds readable and seekable partition bytes like this:

pub struct FileSystem<T: io::Read + io::Seek> {
    partition: T,
}

Among other things, this struct provides a method to iterate over all files of the root directory. The iterator is called Files and each file is represented by an appropriate File struct. File shall be readable and seekable itself to make its content accessible.
All these properties together roughly lead to the following design:

pub struct Files<'a, T: io::Read + io::Seek> {
    fs: &'a mut FileSystem<T>,
    current_offset: u64,
    last_offset: u64,
}

impl<'a, T> Iterator for Files<'a, T: io::Read + io::Seek> {
    type Item = io::Result<File<'a, T>>;

    fn next(&mut self) -> Option<Self::Item> {
        ...
    }
}


pub struct File<'a, T: io::Read + io::Seek> {
    fs: &'a mut FileSystem<T>,
    name: [u8; 255],
}

impl<'a, T> io::Read for File<'a, T: io::Read + io::Seek> {
    fn read(&mut self, buf: &mut [u8]) -> Result<usize> {
        ...
    }
}

impl<'a, T> io::Seek for File<'a, T: io::Read + io::Seek> {
    fn seek(&mut self, pos: SeekFrom) -> Result<u64, Self::Error> {
        ...
    }
}

Now Rust's borrow checker and the design of std::io::Read are giving me a hard time to actually implement my plan.
fn read needs a &mut self, so I need to carry mutable references to FileSystem in Files and File just to be able to read.
However, when doing that, I cannot return a File from Files::next due to the lifetime limitations of Iterator.

I have a few ideas around this, but none that really convinces me. It feels a lot like "pick your poison":

  1. FileSystem::partition becomes a RefCell<T> (like the fatfs crate does)

    • Advantages: All references to FileSystem can be constant. Iterator, io::Read/io::Seek, and even io::Write could be implemented without any hassles.
    • Disadvantages: Borrow-checking happens at runtime. I lose fundamental guarantees of the Rust compiler, and contributors need to be extra-cautious to not cause any panics.
  2. Using the positioned-io crate (like the ext4 crate does)

    • Advantages: Reading is possible with a constant reference to FileSystem.
    • Disadvantages: This is no solution for later adding write support, and incompatible to std::io.
  3. Not implementing Iterator, but just fn next (like the streaming-iterator crate does)

    • Advantages: io::Read, io::Seek, and io::Write can be implemented.
    • Disadvantages: I lose all advantages of Iterator, like for loops or combining my iterator with others (map, zip, etc.).
      By holding mutable references, I'm also subject to very rigid lifetime restrictions. Two simultaneous File instances aren't possible, and this is a likely case.
  4. Only store small frequently accessed data (like the name) inside File, pass a temporary mutable reference to FileSystem every time we access the file content

    • Advantages: I can implement Iterator and have two simultaneous File instances.
    • Disadvantages: I cannot implement io::Read/io::Seek/io::Write on File, as their function signatures don't take a &mut FileSystem. A caller may also pass me a different &mut FileSystem than I expect.

Am I missing anything? Do you have any other ideas?
I'd be grateful for any guidance out of this jungle.

Cheers,

Colin

2 Likes

I'm not clear what your crate in intended to do. Is the idea to parse a disk image and provide an API to read it as if it were mounted by the OS? If so, is your goal to provide similar semantics to those provided by the OS?

You mention adding write functionality possibly. That is going to hugely impact your API so I'd either commit to doing it or rule it out. The OS allows to read a directory while it's being written to, which means you need locking synchronization.

If you remain read only, then why does anything need a mut reference to FileSystem?

1 Like

An entire filesystem image is too large to be read just once into a giant u8 buffer (like I could do with registry hives in my nt-hive crate).
Therefore, FileSystem requires the std::io::Read trait for partition to read the desired bytes on demand. However, fn read of std::io::Read needs a &mut self to update the inner cursor.
Hence, I need to propagate &mut up to all FileSystem references.

As I said, the positioned-io crate wouldn't suffer from this problem. But if I used that, I couldn't benefit from the vast amount of crates that rely on std::io semantics (such as binread, which is very useful for parsing filesystem structures).

This indeed needs to be considered once write support is being tackled.
But the basic requirement for write support are mutable borrows, and as I illustrated above, this already applies to reading.

No idea how to help you, other than to link to tfs, which might give you some inspiration.

I guess I'm confused because you haven't given any code examples involving FileSystem, or any API involving FileSystem, so it's not at all clear what it is or what it does or how it might be used.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.