I would like to efficiently partition a sorted collection. partition does what I need but it looks at every element of the Iterator, right? So I am looking for something like a special take_while that does not ignore the rest of the elements. Here is a small example:
fn main() {
let x = (0..5).collect::<Vec<usize>>();
let (left, right): (Vec<_>, Vec<_>) = x.into_iter().partition(|x| *x < 3);
assert_eq!(left, vec![0, 1, 2]);
assert_eq!(right, vec![3, 4]);
}
fn main() {
let x = (0..5).collect::<Vec<usize>>();
let pos = x.partition_point(|&i| i<3);
let mut left = x;
let right = left.split_off(pos);
assert_eq!(left, vec![0, 1, 2]);
assert_eq!(right, vec![3, 4]);
}
Yes, that looks good, thank you, but I'm not sure that solves my problem because I would like to continue processing the chunks before collecting them. In particular, it would like to apply a filter to right, so it would be nice if there was no allocation before the filtering.
let mut iter = x.into_iter().peekable();
let mut left = Vec::new();
while let Some(i) = iter.peek() {
if *i >= 3 {
break;
}
left.push(iter.next().unwrap());
}
let right: Vec<usize> = iter.collect();
Thank you for your answers! It seems that, at least for my use case, the most efficient approach is the one that uses partition_point and split_off. It is around a third faster than the other approaches. From peeking_take_while and partition, peeking_take_while is somewhat faster than partition. These seem like reasonable results, since partition_point uses binary search, peeking_take_while only iterates over one part of the items and partition iterates over all items. Is that correct?
Now, apparently unnecessarily, I felt somewhat uneasy about using split_off that creates a new Vec because after doing the split, I want to continue processing both chunks by applying some adapters (like filter) to them. In other words, I felt like it would be more elegant (and efficient) to take my collection x turn it into an iterator i, then turn i into two iterators (left, right) that operate on disjoint parts of the underlying collection. (The split should be efficient, i.e. it should exploit the sortedness of x).
So I think I tried to:
Work with iterators from as early as possible and do not consume them until each adapter has been applied.
Use into_inter().
Does trying to do these things make sense?
I think doing 1 is conceptually appealing but does it matter performancewise?
Doing 2 instead of calling iter() also seems to be conceptually appealing because I do not need x after its processing is over. However, using iter() might be easier because then I do not have to take ownership of disjoint parts of the underlying collection.