Greetings to the fans of the world best programming language (yes, I really mean it)
I am in need of your expert guidance please.
I am wondering if the std:: .partition iterator (Iterator in std::iter - Rust) is the most idiomatic and efficient way to process a stream of functions calls returning Result<> and Option types.
The streams will be large (Rust structs of hundreds of thousands or millions of input objects, transformed into output objects) that will be read from storage, processed and de-serialized back to storage (in transformed form).
I need to keep both the Ok() and Err() results and also (separate streams) to process both Some() and None results. All successes and errors and Some/None will be logged somehow, I dont want to crash the job on (most) errors.
Because I want to process both Ok/Err and Some/None results, I am considering the partition iterator in std library.
Is this the most idiomatic way to design the solution?
What about performance if the partition compared to using a Rust loop-based solution?
Because it is going to be I/O bound task, I am not considering use of rayon crate.
But perhaps I need to look at tokio, assuming it offers iterator/stream based solution with threads?
I dont know if there is a partition iterator equivalent in tokio, I have not used tokio before, happy to learn.
I plan to start with a single-threaded Rust-idiomatic solution, if necessary I can use threads to improve performance later.
Therefore I prefer to use the iterators instead of various loops but open to all suggestions.
If you really have an I/O bound task, then the difference between (equivalent) loop-based or iterator-based solutions will likely be negligible.
Iterator::partition() allocates two collections into which the elements will be placed. If you really have millions of items which could be processed independently, one-by-one (read-transform-write), then it seems unnecessary to put them all into memory at the same time.
Yes, you are correct, I do not want to put them in memory.
I do want to process both Ok() and Err() output and same goes for Some and None, by separate functions downstream.
So using partition is not best solution then?
What other iterators should I look at to process both Ok/Err Some/None in single pass?
itertools crate?
No, partition will collect your values in memory all at once, so why would you use it? You could probably just write a loop, since what you are doing is mostly imperative:
for item in stream {
match item {
Ok(value) => writer.write(value.transform()),
Err(error) => log(error),
}
}
That uses extend, but could be tweaked to just take a different error handler closure. Then it'd still be lazy, one-pass, and non-allocating, with the "normal" iterator route dealing with the Oks and the slightly-out-of-band closure dealing with the Errs as they show up.