Library for getting the physical storage type that a file is located at

I am making a program that reads the hashes of many files and copies many files, and I'm going to set concurrency limits based on the storage type of the file (only copy 1 file at a time HDDs, but copy many files at a time (32 for SATA SSDs and 256 for NVMe SSDs) for SSDs). How can I figure out if a file is stored on a SATA HDD, SATA SSD, NVMe, SD card, USB flash drive, or a HDD/SSD/NVMe connected through USB? I will be running this program on Windows and Linux.

I tried looking for a crate for doing this, but I didn't find one. I might make a crate for this, if it is not too hard and there isn't already a crate for it.

Also, if I am not able to determine the storage type what would be a good default assumption for storage type or default limit on concurrent read/writes?

Also if you have advice on the limits on concurrent read/writes for my program that would be great too.

The storage abstraction does not concern itself with whether the bytes exist on a magnetic platter, in transistors, or on an optical medium. It also does not care whether the wire is solely on the motherboard like NVMe or 100 KM of copper to the nearest data center.

Your OS is aware of some of this information, but not all of it. (How can it possibly know the physical layout of a network-mounted storage device?)

In Linux, you can glean some information about mounts from proc-fs. For windows, who knows! Maybe there is a weird storage subsystem kernel function you can use.

If finding out the underlying storage type is too difficult, what would you suggest I set as the limit for concurrent reads and copy operations? All of my code is async so I could copy every file simultaneously, but I'm worried that might decrease performance compared to copying 1 file at a time in some cases.

It probably depends a bit on specific details of the CPU, more than details of the physical IO devices. The thing to be concerned about with IO is usually latency. If your IO operations generally take 10 ms, then you can spend that 10 ms queueing up additional operations to amortize the latency. Most IO subsystems are still synchronous (blocking) by design [1]. Your operations are likely going to be blocking a thread. So, the CPU’s context switching performance will matter if you need 100,000 threads for 100,000 concurrent IO operations.

Tokio’s fs APIs use its blocking thread pool, which by default I believe is sized to the CPU’s logical core count. With an IO-bound workload, you probably want more system threads than you have logical cores. Maybe 10x or 100x more, depending on the exact workload [2]. The limited thread pool will become a queue of waiting async tasks if you have more tasks than threads. Which in turn adds more latency. But it’s the worst kind of latency: The kind you cannot amortize.

I think you really want the bottleneck to be in the IO devices themselves, not the task scheduler. More threads is probably better, minding that the CPU has an upper bound on the number of threads that can perform useful work. But more threads is definitely better if you are copying between multiple physical storage devices. That’s actual parallelism, not just concurrency.


  1. io_uring and IOCP are the outliers. ↩︎

  2. I don’t have realistic numbers, this is just a rule of thumb. You would have to profile the exact environment to determine what the best ratio actually is. ↩︎

Here’s what looks like a pretty good resource on the topic of concurrent disk IO operations: Performance Impact of Parallel Disk Access | Piotr Kołaczkowski

2 Likes

Tokio’s blocking thread pool is different from its main thread pool. The blocking pool spawns up to the blocking threads maximum, which is 512[1]. That’s probably more threads than necessary (again, depending on the specific workload).


  1. by default, can be configured on the runtime::Builder ↩︎

Thanks, I didn’t know the default was 512. Much better general purpose default than logical core count (or the oft-cited n+1 which never made sense to me). But yeah, it needs real analysis to determine if there is a better number to set. Or perhaps it could dynamically scale.

It's so nice that you sent that article because I found out about fclones after starting to read that article and realized that I can just use fclones to do my task instead of writing my own program!

I was making a program top copy files from one folder to another, but skipping the files that are exactly the same (exact same contents) in an existing folder. For example if there were files like this:

├── photos
│   ├── 2022
│   │   └── IMG0.txt
│   ├── 2023
│   │   ├── First Day of School.txt
│   │   └── some random photo.txt
│   └── 2024
└── source
    ├── Birthday.txt
    ├── First Day of School.txt
    └── IMG0.txt

And I wanted to copy from source to photos/2024 without copying any files that are already included in photos. I realized I can just use fclones to find duplicate files in photos and source (with --cache since I will be doing this multiple times on hundreds of gigabytes of files), move the duplicate files in source to a different folder (in case I don't want to delete it), and copy the non-duplicate files from source to photos/2024.