File IO is not really possible without blocking IO in any case, so async does not gain you much. For this reason, reading the files synchronously in a spawn_blocking call is a fine way to do it. As for the ram disk, it will speed up either approach simply because ram disks allow faster access to the file.
@cuviper As I understood the purpose, the set is created once and then used, in which case I would not recommend a concurrent set. To be fair, this assessment was aided by having also answered this thread.
for now i am using a python script to process the files, which is really really slow (~30 minutes) i am trying to replicate the code using rust so i can reduce the time and process the files efficiently.
is the procedure on top better, or should i go with extend?
basically all this server does is send all the actions that occur during a period of time to another server, every file contain a list of actions that happen during a period of time (action per line) my goal is to get a unique list of actions that happen and send them to another server.
the files are sent using scp (i don't have control on it) to my server and are placed in a directory (which i can change)
what i have now is a cron that run every hour, load the pyhton script that processes the files that are sent to me. the script is loaded and iterate throught the files inside the directory and process them one by one create a set of unique actions save them to a file and send them to another server.
this process takes a lot of time to finish, and sometimes when files are too big, it takes more than 1 hour, which causes two scripts to run.. (and this causes issues) the files sizes vary from 10M to 1G and the number of files is not a constant.
in my current executable (rust) i want to remove the cron, and let rust wait for files to come and process them
Sure that makes sense. It sounds like the number of files is reasonably small, in which case I'd probably go for just spawning a thread, since the bulk of your work appears to be either file IO or CPU bound. If you need to process at most one file at a time, you can even avoid the whole thread business.
Because, if I have understood correctly, you need to do some "merge" over the data spread across the multiple files. If it isn't the case / if you can handle each file separately and send the right data accordingly, then the inotify layer will indeed suffice