Partition sorted collection

I would like to efficiently partition a sorted collection. partition does what I need but it looks at every element of the Iterator, right? So I am looking for something like a special take_while that does not ignore the rest of the elements. Here is a small example:

fn main() {
    let x = (0..5).collect::<Vec<usize>>();

    let (left, right): (Vec<_>, Vec<_>) = x.into_iter().partition(|x| *x < 3);

    assert_eq!(left, vec![0, 1, 2]);
    assert_eq!(right, vec![3, 4]);
}

You could use partition_point and split_off:

fn main() {
    let x = (0..5).collect::<Vec<usize>>();

    let pos = x.partition_point(|&i| i<3);
    let mut left = x;
    let right = left.split_off(pos);

    assert_eq!(left, vec![0, 1, 2]);
    assert_eq!(right, vec![3, 4]);
}
2 Likes

Yes, that looks good, thank you, but I'm not sure that solves my problem because I would like to continue processing the chunks before collecting them. In particular, it would like to apply a filter to right, so it would be nice if there was no allocation before the filtering.

Then use partition.

You can use Itertools::peeking_take_while from the itertools crate (or the version from the standalone peeking_take_while crate).

Or you can use Iterator::peekable and a loop:

    let mut iter = x.into_iter().peekable();

    let mut left = Vec::new();
    while let Some(i) = iter.peek() {
        if *i >= 3 {
            break;
        }
        left.push(iter.next().unwrap());
    }
    let right: Vec<usize> = iter.collect();
2 Likes

Thank you for your answers! It seems that, at least for my use case, the most efficient approach is the one that uses partition_point and split_off. It is around a third faster than the other approaches. From peeking_take_while and partition, peeking_take_while is somewhat faster than partition. These seem like reasonable results, since partition_point uses binary search, peeking_take_while only iterates over one part of the items and partition iterates over all items. Is that correct?

Now, apparently unnecessarily, I felt somewhat uneasy about using split_off that creates a new Vec because after doing the split, I want to continue processing both chunks by applying some adapters (like filter) to them. In other words, I felt like it would be more elegant (and efficient) to take my collection x turn it into an iterator i, then turn i into two iterators (left, right) that operate on disjoint parts of the underlying collection. (The split should be efficient, i.e. it should exploit the sortedness of x).

So I think I tried to:

  1. Work with iterators from as early as possible and do not consume them until each adapter has been applied.
  2. Use into_inter().

Does trying to do these things make sense?

  • I think doing 1 is conceptually appealing but does it matter performancewise?
  • Doing 2 instead of calling iter() also seems to be conceptually appealing because I do not need x after its processing is over. However, using iter() might be easier because then I do not have to take ownership of disjoint parts of the underlying collection.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.