VirtualFileSystem to async or not

Hi,

I want to implement a VirtualFileSystem (vfs). The idea is that one has a vfs object where one can write like

vfs.mount("/temp", "/some/temp/path");
...
let async_handle = vfs.open("/temp/test.txt");

And also to support different drivers such as zip files so it's possible to do

vfs.mount("/foo", "/some/path/file.zip");
vfs.mount("/bar", "ftp.foobar.com");
...
let async_hande = vfs.read("/foo/file_in_zip.txt");
let async_handle_2 = vfs.read("/bar/index.txt");

I want read/open/etc to happen on separate thread(s) so decompression and loading of larger files can be done in the background. I also want to be able to track the progress so I can update the UI on loading progress as well.

My question is if I should use async for this or if it makes sense to make something more custom (dealing with my own threads or something like that)

Thoughts?

It doesn't sound suitable for async, no. You aren't actually doing any IO where you are just waiting for the OS.

1 Like

Yeah so in my case I have a "main/ui thread" that goes and does other things and the way I see these loads/opens is requests that is being finished later so having something separate may work better.

This is a design question I am myself always struggling with. First, I don't think that API and implementation need to match -- you can have a blocking thread-pool based implementation that exposes async API. Similarly, you can have a async io_uring/iocp based implementation, that exposes blocking API.

From what I know about file systems, the blocking implementation would make more sense.

The question of API is interesting. If your main (UI) thread is already async, than it makes sense to provide an async API.

If it is not async, you need some kind of blocking, evented, selectable API. That, is, you should be able to do the old PHP trick (edited the link)

// Schedule two reads in parallel, do not block
let foo = vfs.read("/foo.rs");
let bar = vfs.read("/bar.rs");

// `.contents` is a blocking call that returns `&String`
let foobar() = foo.contents().clone() + bar.contents();

And you'd also want to select between two blocking calls (and cancel them as well)

let foo = vfs.read("/foo.rs");
let bar = vfs.read("/bar.rs");

// Some made-up syntax
select! {
    foo.contents() => ...,
    bar.contents() => ...,
}

The above APIs are blocking, but still allow for concurrency tricks associated with async.

The catch is, I don't know what is the vocabulary type of choice for expressing this concurrency. One choice is using the Future trait, but then sync callers would have to call block_on everywhere, which gets verbose. Another choice is channels, but the existing channels APIs feel too general: oneshot computations are awkward, there's no map/filter/etc, select is macro driven. For rust-analyzer, I went with channels, which worked ok, but only because the sink is an event loop anyway. I am not sure if that is an accident, or if all concurrency is just better expressed as a game loop.

Thanks for the reply!

In my case the UI thread/main thread isn't async (it's actually being called via FFI from a Qt Application where the Rust code deals with all "businesses logic")

This thread can (or should minimize at least) the time it blocks so blocking on file loads being done (which could be across a network) is a big no no.

The idea is that I would be able to to (via the returned handled) to cancel requests, check if they are finished, etc but the actual work is being done somewhere else.

So yeah, I'm unsure if using Futures would be the best choice here, I think it likely makes sense to do something custom instead and have some worker thread(s) that can do the heavy lifting and update the states of the handles.

Nothing says that an object that implements Future can’t also have other useful methods. It might make sense for the returned handle to provide blocking methods for non-async users and also implement Future so that it can also be .awaited by async code.

Yeah that is a good point.

I would advice to build at least the initial impl on top of channels, as that would handle all the tricky synchronization and cancellation for you. Ie, wrapping a channel into a type which enforces one-shotness

struct Handle<T> {
  chan: Receiver<T>,
  cancellation_token: Sender<!>,
  slot: OnceCell<Option<T>>,
}

impl<T> Handle<T> {
    pub fn get(&self) -> Option<&T> {
        self.slot.get_or_init(|| self.chan.recv().ok()).as_ref()
    }
}

struct Promise<T> {
    chan: Sender<T>,
    is_cancelled: Receiver<!>,
}
1 Like

Sounds like a good idea for sure. I have used std::sync::mpsc::channel a bit before but I only made something very simple quite a while ago, so I will try to experiment a bit .

Interesting discussion! I think there is some value to both the sync and the async approach.

One thing I would look out for is how much concurrency you actually need. Are we talking about thousands of concurrent file reads, or a dozen? The less concurrency you need, the smaller the benefit of Futures and async code. I think in typical desktop applications which open maybe a dozen of sockets and files there is not really a lot of gain from any async IO. For files it might be even worse than synchronous IO, since the async implementations just dispatch the work to threadpools and thereby cause additional context switches.

The semantic question around async vs sync APIs is interesting. With asynchronous APIs you can more naturally "race" 2 operations and cancel them. However some of the patterns that one e.g. would do with this kind of race in other languages (like Javascript) are not that applicable in Rust anyway. E.g. there you might fire a callback after each async operation that directly updates a UI - which works just fine since the callback already runs on the eventloop. With Rusts solution there are no callbacks, and you might want a channel or something like that anyway to perform the update.

And the cancellation aspect might be deceptive: Although a select! block would allow to seemingly cancel any async operation the underlying FS operation might still continue to run in the background since it would most likely be handled by a threadpool and can not be synchronously cancelled. Actually even if you would use io_uring or IOCP for async file IO an async file operation could not be synchronously cancelled. So any use of synchronous cancellation with the intent of directly reusing that file instead of just closing it might run into undefined behavior.

Another thing that you might want to take into account is complexity of interfaces. Async interfaces/traits are unfortunately not as straightforward as their synchronous counterparts. E.g. there exists a variety of challenges around lifetimes, since those interfaces return Futures which have a lifetime that is coupled to the lifetime of function arguments. Trait objects to avoid type complexity are also harder to use, unless you are ok to Box::pin everything.

Due to these things I'm not convinced about the benefit of async operations for that use-case.

1 Like

In my case it's a dozen of files for sure. (In general I would say it's less than 10)

Indeed. This is why I think it's better that I can from my UI thread just check/update the state of a file handle and not rely on callbacks as you say.

Yeah that is what I'm thinking as well. Thanks for the insights :slight_smile:

I did some testing on how this could work.

Update 2: Fixed

I decided to go with crossbeam::crossbeam_channel as it has the ability to both send and receive messages. As the message type can only be the same for both send/recv it makes it a bit ugly but I can hide all of those details internally so it may not be that bad.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.