Iterator best practice - please comment on my code

yuribudilov · June 17, 2023, 4:58am

Hello everyone.

I am constantly trying to improve my Rust skills and looking for idiomatic clear and concise solutions.
I would like for Rust experts here to comment and possibly rip my code into shreads so I can learn more about Rust.

Imagine I have a large incoming stream of "things" to process - each function to "process item" will return either a Result<> or an Option<> (Some == Good, None == Bad).
I want to store the Ok/Some results in one vector and store None/Err results into a separate vector.
I do not want to crash/stop/panic my program, I want to process all items, good and bad.

In summary I want to take all incoming results and split them into 2 output streams (Good and Bad streams) and I want to preserve error messages and erroneous inputs.

This is a common requirement for a lot of data processing streams.
I am just looking for the most optimal Rust pattern to use.

Below var vector named inputs is the incoming data.
var verrors is the vector which will hold all bad inputs.
var results is vector which will hold all good inputs.
Below code simply checks for inputs to be valid u8 (i.e. Good), to be stored into results vector and all invalid u8 are Bad and will be stored in verrors vector.

    let inputs: Vec<i64> = vec![0, 1, 2, 3, 4, 512];
    let mut verrors = Vec::<i64>::new();
    let results = inputs
        .iter()
        .filter_map(|v| {
            let z = <u8>::try_from(*v);
            if let Err(e) = z {
                println!("error={:?} input={}", e, *v);
                verrors.push(*v);
                None
            } else {
                Some(z.unwrap())
            }
        })
        .collect::<Vec<u8>>();
    println!("result {:?}", results);
    println!("errors {:?}", verrors);

Please let me know if this code is passable or it could/should be made better/faster.

thank you very much

steffahn · June 17, 2023, 5:25am

You could use existing solutions: itertools::Itertools::partition_result.

2e71828 · June 17, 2023, 5:39am

Your code looks fine to me; it might be a little more efficient to use inputs.into_iter().

That said, I'd personally write this a little bit differently for a couple of stylistic reasons:

I prefer explicit loops to iterator chains when side-effects are involved
Collecting an iterator of results into two Vecs feels like an operation that deserves its own function

So, I'd generally write something like this instead:

fn collect_results<T, E>(
    input: impl IntoIterator<Item = Result<T, E>>
) -> (Vec<T>, Vec<E>) {
    let mut ok = vec![];
    let mut err = vec![];
    for item in input {
        match item {
            Ok(x)  => { ok.push(x); }
            Err(e) => { err.push(e); }
        }
    }
    (ok, err)
}

fn main() {
    let inputs: Vec<i64> = vec![0, 1, 2, 3, 4, 512];
    let (results, verrors) = collect_results(
        inputs.into_iter().map(<u8>::try_from)
    );
    println!("result {:?}", results);
    println!("errors {:?}", verrors);
}

minimum · June 17, 2023, 5:43am

and I want to preserve error messages and erroneous inputs.

One options is for_each

fn main() {
    let inputs: Vec<i64> = vec![0, 1, 2, 3, 4, 512];

    let mut oks = Vec::new();
    let mut fails = Vec::new();

    inputs.iter().for_each(|&input| match u8::try_from(input) {
        Ok(val) => oks.push(val),
        Err(err) => {
            println!("error: {err}, input: {input}");
            fails.push(input);
        }
    });

    println!("{:?}", oks);
    println!("{:?}", fails);
}

Or you can use https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.fold .

If you don't need that println!, @steffahn 's suggestion is good for this.

yuribudilov · June 17, 2023, 6:06am

Much obliged, very elegant!

I had something similar in mind at firdt (but much less elegant than yours) but thought the iterator based solution would be more to Rust-liking.
I may have been wrong, of course.

Would excution performance of using the loop-based solution be similar to using the iterator functions?

I read somewhere that sometimes Rust iterator-based code has more room for compiler optimising the code better whereas the explicit looping would make it more difficult for the compiler to optimise it.

BTW - I used vectors in my code mostly for the illustration. In actual practice the incoming data will be a very large data stream (network or external storage) and would resemble unbounded data queue. Likewise for the output vectors too, they might end up being data sinks (outputs to external storage or even network).

Thanks for the opportunity to learn more about Rust.

H2CO3 · June 17, 2023, 6:28am

No, that's not true as-is. A for loop is actually itself desugared to advancing the iterator. If you think about it, it couldn't really work generically otherwise. So this:

for item in collection {
    body
}

becomes something like

let mut iterator = collection.into_iter();
while let Some(item) = iterator.next() {
    body
}

What you might be confusing this with is the advice to avoid indexing on arrays/slices/vectors. Because indexing incurs bounds checks, it can be slower than using an iterator. But it's not the looping part that can become less optimized, but the fact that there is extra work introduced by bounds checking upon every iteration.

2e71828 · June 17, 2023, 6:28am

It should be quite similar; if there's any significant processing of the items, that will easily dominate any performance difference here.

In that case, I'd probably use a background thread and channels; the receivers could then be sent to separate threads to work at the same time.

/// Consume `input` on a background thread, and return results on separate
/// channels.
///
/// Will queue up to `cap` results in each channel, and then pause
/// to wait for downstream.
///
/// Closing the ok channel will abort future processing,
/// but closing the error channel will silently discard errors.
fn spawn_processor<T:Send+'static, E:Send+'static>(
    cap: usize,
    input: impl Send + 'static + IntoIterator<Item = Result<T, E>>
) -> (Receiver<T>, Receiver<E>) {
    let (ok_send, ok_recv) = sync_channel(cap);
    let (err_send, err_recv) = sync_channel(cap);
    std::thread::spawn(move ||
        for item in input {
            match item {
                Ok(x) => { ok_send.send(x).unwrap(); }
                Err(e) => { let _ = err_send.send(e); }
            }
        }
    );
    (ok_recv, err_recv)
}

system · September 15, 2023, 6:29am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Idiomatic way to return/receive iterator? help	9	2515	December 10, 2019
Iterating over vector of Results (of tuples) – idiomatic way help	10	5077	June 17, 2022
Idiomatic way of splitting iterator of Result into two iterators (Ok, Err) help	18	2758	July 1, 2024
I need some help about iterator :) help	8	791	January 12, 2023
Learning advanced Rust iterators (general help) help	9	1064	November 22, 2021

Iterator best practice - please comment on my code

Related topics