I’m aiming to build an intermediate layer in between an application and the file system. This layer has to access (read) existing files and write data into files. Most of the time the intermediate layer makes decisions based on (file-) names without reading or writing files themselves.
So what’s the best way to store these files?
From my beginners point of view there are at least two ways to do that. And I’d like to hear some full fledged rustaceans’ opinions.
First option is to store filenames as Strings. Whenever it comes to read from a particular file I’ve to create the std::fs::File from stored String(s). I bet that takes some runtime but should release the handles nicely once the content of a file is read.
Second option is to store std::fs::File right from the start. I’d think that the price for the better performance might be some blocking in the underlying file system. And I imagine the ownership of these handles to turn challenging – especially when this intermediate layer has to operate within a multi-threaded environment.
What’s the recommended way to store a number of names / files?
Can you clarify what do you mean by storing? std::fs::File is what is known as a file handle; that is, a wrapper around a file descriptor in Unix systems, which guarantees that the file exists as long as you keep the handle alive (and no external process messes with the file system).
Your case is different mainly in that you have many files, so you are more likely to run into the limit on number of open files, but reading that thread should give you some useful background.
What do you mean by “blocking in the underlying file system”?
The primary effect on the file system of having the files open is that their data is kept even if deleted from the directory, until you close them. But this does not cause any operation to block.
It is in fact possible to read and write through &File from multiple threads at once, so the ownership is not a problem, but the actual behavior of the file is: if you write to the same File from multiple threads, that will probably not produce useful results except in very specific situations like an append-only log file. And if you intend to read the file, then that can be done from multiple threads but you need separate Files (that is, open the file twice), in order to have separate cursor positions. Multiple threads reading from the same File would each get an arbitrary subset of the data in the file.
We have read_at and write_at to do positioned IO. For reading this works quite well. For writing it requires proper coordination between writer threads and some knowledge about content lengths.
But if multiple Seek offsets are needed then it's better to open a file multiple times.
It always bugs me that Windows has a nearly identical API that together with the Unix config covers everything you're likely to see use on.
It feels like std could have an PositionedFile (bike shed welcome) that doesn't support the io traits and has a common interface across platforms, but I suppose it's easy enough to write.