I am constantly trying to improve my Rust skills and looking for idiomatic clear and concise solutions.
I would like for Rust experts here to comment and possibly rip my code into shreads so I can learn more about Rust.
Imagine I have a large incoming stream of "things" to process - each function to "process item" will return either a Result<> or an Option<> (Some == Good, None == Bad).
I want to store the Ok/Some results in one vector and store None/Err results into a separate vector.
I do not want to crash/stop/panic my program, I want to process all items, good and bad.
In summary I want to take all incoming results and split them into 2 output streams (Good and Bad streams) and I want to preserve error messages and erroneous inputs.
This is a common requirement for a lot of data processing streams.
I am just looking for the most optimal Rust pattern to use.
Below var vector named inputs is the incoming data.
var verrors is the vector which will hold all bad inputs.
var results is vector which will hold all good inputs.
Below code simply checks for inputs to be valid u8 (i.e. Good), to be stored into results vector and all invalid u8 are Bad and will be stored in verrors vector.
let inputs: Vec<i64> = vec![0, 1, 2, 3, 4, 512];
let mut verrors = Vec::<i64>::new();
let results = inputs
.iter()
.filter_map(|v| {
let z = <u8>::try_from(*v);
if let Err(e) = z {
println!("error={:?} input={}", e, *v);
verrors.push(*v);
None
} else {
Some(z.unwrap())
}
})
.collect::<Vec<u8>>();
println!("result {:?}", results);
println!("errors {:?}", verrors);
Please let me know if this code is passable or it could/should be made better/faster.
I had something similar in mind at firdt (but much less elegant than yours) but thought the iterator based solution would be more to Rust-liking.
I may have been wrong, of course.
Would excution performance of using the loop-based solution be similar to using the iterator functions?
I read somewhere that sometimes Rust iterator-based code has more room for compiler optimising the code better whereas the explicit looping would make it more difficult for the compiler to optimise it.
BTW - I used vectors in my code mostly for the illustration. In actual practice the incoming data will be a very large data stream (network or external storage) and would resemble unbounded data queue. Likewise for the output vectors too, they might end up being data sinks (outputs to external storage or even network).
Thanks for the opportunity to learn more about Rust.
No, that's not true as-is. A for loop is actually itself desugared to advancing the iterator. If you think about it, it couldn't really work generically otherwise. So this:
for item in collection {
body
}
becomes something like
let mut iterator = collection.into_iter();
while let Some(item) = iterator.next() {
body
}
What you might be confusing this with is the advice to avoid indexing on arrays/slices/vectors. Because indexing incurs bounds checks, it can be slower than using an iterator. But it's not the looping part that can become less optimized, but the fact that there is extra work introduced by bounds checking upon every iteration.
It should be quite similar; if there's any significant processing of the items, that will easily dominate any performance difference here.
In that case, I'd probably use a background thread and channels; the receivers could then be sent to separate threads to work at the same time.
/// Consume `input` on a background thread, and return results on separate
/// channels.
///
/// Will queue up to `cap` results in each channel, and then pause
/// to wait for downstream.
///
/// Closing the ok channel will abort future processing,
/// but closing the error channel will silently discard errors.
fn spawn_processor<T:Send+'static, E:Send+'static>(
cap: usize,
input: impl Send + 'static + IntoIterator<Item = Result<T, E>>
) -> (Receiver<T>, Receiver<E>) {
let (ok_send, ok_recv) = sync_channel(cap);
let (err_send, err_recv) = sync_channel(cap);
std::thread::spawn(move ||
for item in input {
match item {
Ok(x) => { ok_send.send(x).unwrap(); }
Err(e) => { let _ = err_send.send(e); }
}
}
);
(ok_recv, err_recv)
}