What's a fast way to always get the oldest file from a directory?

I have a directory which may contain hundreds of thousands of files.

The filenames are prefixed with a priority number
Each file has a modification date.

for example:

100_randomid.bin modification date 10 seconds ago <--- the file we want
100_randomid.bin modification date 6 seconds ago
100_randomid.bin modification date 7 seconds ago
100_randomid.bin modification date 1 seconds ago
099_randomid.bin modification date 3 seconds ago
099_randomid.bin modification date 2 seconds ago
099_randomid.bin modification date 1 seconds ago
099_randomid.bin modification date 0 seconds ago
098_randomid.bin modification date 10 seconds ago
098_randomid.bin modification date 8 seconds ago
098_randomid.bin modification date 2 seconds ago
098_randomid.bin modification date 1 seconds ago

I have multiple async tasks running in multiple threads "the consumers".
Each consumer task needs to remove the file at the top of the sorted list as shown in the example.

The list of files must be sorted first on name (larger priority comes first), and then on modification date. The example above shows the sort order.

Files are being added thousands of time per second - with random priority and random modification date.

The naive solution would be for each consumer task to sort the directory into the order shown above and get the top file, but this might be happening tens of thousands of times per second so would be way too slow.

Then I thought perhaps I could have a separate task with an inotifywait watching the directory. Each time it is notified of a new file being added to or removed from the directory, it updates an index that it maintains in memory. Consumers could then ask this function for the filename at the top of the sorted list.

The question that I have is - what sort of code allows creation of such an index that would be suitable for maintaining a sort list whilst constantly adding new filenames that have random priorities and random modification dates? Is there a specialised index of some sort suited to this? Or would I have something like a Vector of Structs and simply re-sort the Vec based on the values in the struct every time a new file is added to or removed from the directory (still seems inefficient)? Is there a better way to do it?

thanks!

If all you need to pop is the top, that sounds like a priority queue.

Thanks for the reply although unless I have misunderstood, it doesn't answer the question. You are correct though in that the outcome of this process is a priority queue. My specific question is in the final paragraph. Thanks for taking an interest.

If you follow the link in @quinedot's post, you can find the relevant documentation for the standard Rust priority queue, std::collections::BinaryHeap.

Note: If you also need to remove items that are not at the front of the queue, then it could sometimes be more efficient to use a sorted collection like BTreeSet or BTreeMap. Also, depending on the details of your code, it might make sense to keep a secondary HashMap as an index from filenames to keys in your BTreeMap.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.