Is there a way to avoid interim .collect() calls

Hi all,

I have an if / else branch where an iterator needs to be constructed in completely different ways, but the result of which can be acted on the same Unfortunately I can't figure out a way to represent the interim data structure (foo in this pseudocode) so I ended up collecting to a Vec. Is there a better way? (I suspect not because the types are going to be used to optimise away the steps, right?)

e.g.

let foos: Vec<_>;

if (scenario_a) {
  foos = bars.into_iter().{lots of mapping and filtering and whatnot}.collect();
} else {
  foos = bars.into_iter().{lots of completely different processing with no commonality to the other path}.collect();
}

return foos.into_iter().map( { same thing } ).collect();

Assuming that foos sans .collect() is an Iterator<T> for the same T in both branches, you can wrap the branches in Either::Left and Either::Right from the either crate, giving you a single Iterator<T> that you can then call .map({same thing}).collect() on.

6 Likes

Dynamic dispatch is also an option here:

let foos: Box<dyn Iterator<Item=T>> = if (scenario_a) {
  Box::new(bars.into_iter().{lots of mapping and filtering and whatnot})
} else {
  Box::new(bars.into_iter().{lots of completely different processing with no commonality to the other path})
};

return foos.map( { same thing } ).collect();

This may be obvious and does not directly answer your question, but you can write a function or closure that does the

foos.into_iter().map( { same thing } ).collect()

part, and takes an impl Iterator<Item = YourIntermediateType> as a param. And then call this function from both branches. Nothing fancy.

let bars = ["1", "12", "123"].into_iter();
let scenario_a = true;

fn final_step(foos: impl Iterator<Item = String>) -> Vec<usize> {
    foos.map(|s| s.parse().unwrap()).collect()
}

let results = if scenario_a {
    let foos = bars.map(String::from);
    final_step(foos)
} else {
    let foos = bars.filter(|s| s.len() == 2).map(|s| s.to_string().to_uppercase());
    final_step(foos)
};

Optimize away the extra collect? I wouldn't count on that.

1 Like

no, I did not mean that it could be used to optimise away the interim .collect(). I meant that I suspect the full type is needed to optimize each .collect() which is why the full type of all the operations must be kept around. The Either approach above seems to solve this.

1 Like