Dear Rustaceans,
I have just started to teach myself Rust and am at the stage to slowly transition from the Rust book examples to actually writing my first full program, but I seem to have bitten off more than I can chew with the task I need to solve.
Therefore, I would kindly ask for some guidance on a suitable architectural pattern and helpful crates for the following task:
I would like to process a bioinformatic file format called FastQ, essentially plain text files containing millions of separate records to iterate over. Thanks to the noodles crate, asynchronously iterating over the records of a single file is a breeze and several synchronous implementations are available, too. However, FastQ files may come in pairs, where each record in File A has a matching record in File B, and both share the same ID.
So I need to read the records from both files in such a way, that I can process the information jointly in subsequent steps. In most cases, the files are ordered, so that the respective nth record in each file form the nth pair, but I can't bank on this. Therefore, I need to have a small buffer for say 1000 records, where new records from both files are added and once a pair is completed, the information gets injected into another stream and popped from the buffer. It would actually be desirable for the program to exit should the buffer fill up too much (e.g. because by human error files that don't belong together have been submitted and no pairs can be formed).
Could someone kindly suggest crates and names of functions I should familiarize myself with to solve this problem? To put it mildly, I am overwhelmed by the documentation for those concurrency crates. The Tokio / Futures async approach seems very tempting, because the is no CPU-heavy work involved that would justify using real threads with e.g. rayon, but the async patterns are even harder to wrap my head around. So I am also e.g. open to construct a crossbeam queue synchronously, that rayon threads then consume. Ultimately, I am open to anything, just overwhelmed by the many options, so suggestions from somebody with experience in similar tasks would be much welcome!
Thanks a lot for reading and have a nice day everybody!
Matthias