I have a case in which, I fill a vector with about 200 filenames. For each file I load it and do some manipulation on it, which then exports a file. These 200 files are in no way connected, i.e. can be processed independent of each other. My question is then;
Should I use threads or async (tokio for an example)?
I think using something like tokio would be smartest, but is there something I am not considering? If it is indeed tokio, does anyone have a small code bit which does this functionality or how should I go about learning it?
Asynchronous filesystem operations aren't usually possible on many platforms, but I'll focus on Linux. libaio, a C library for "asynchronous filesystem operations," literally spawns a thread pool for "asynchronous" filesystem operations on Linux. Tokio and async-std IIRC do exactly the same. Linux has had issue supporting true async filesystem I/O because asynchronous operations aren't always internally supported based on filesystem implementations and other kernel details.
However, withoutboats' new library ringbahn uses a new kernel API, io_uring, for doing asynchronous operations which does seem to support true asynchronous filesystem operations, though it works differently than many other standard async systems. Most async systems work like this:
I ask the kernel to read/write a socket.
The kernel returns me an ID.
I poll that ID until the kernel says that the operation is ready.
If it's a read operation, I can grab that info from the kernel.
io_uring works differently in that the kernel will actually write data into a buffer you ask for directly, and this has proved somewhat difficult to work with safely in Rust, but ringbahn does some cool Rust things in order to make this safe. io_uring is only supported in Linux kernel version 5.5 or later, which is fairly new. io_uring should generally provide true asynchronous IO for filesystem objects in addition to sockets and the like.
In summary, It really depends on what OS you're using and on what asynchronous system you're using, but at least on Linux, epoll doesn't support asynchronous filesytem operations, so threads are used under the hood. Tokio/async-std use actual OS threads to do filesystem reads/writes.
TL;DR whether you use an async executor or not, you're probably going to end up using OS threads under the hood for filesystem access, unless you use something based on io_uring.
It is true that there are some new experimental apis that only work on new Linux kernels, that would make file IO more usable in the async world. However, as there are no libraries in serious use that supply this kind of functionality, I do not think that it will help the OP of this thread.