Simple, clean O(1) space O(n) time Vec::drain_filter?

I am using Rust stable, which currently makes DrainFilter in std::vec - Rust unavailable.

Is there a nice, clean, simple, O(n) Vec::drain_filter ?

I have been trying to write my own impl, but the issue I am running into is the need for T: Clone, which I am trying to avoid.

  1. If we drop the O(n) time, it is easy to do via Vec::remove, but this has worst case O(n^2).

  2. If we drop the O(1) space, we build a new vector, then copy it back.

  3. O(n) together with O(1) -- something I can easily do in C, but not sure how to do in safe rust.

To stay in safe, you can swap elements to pack the valid ones at the start and truncate out the invalid ones at the end. This may not preserve the order of invalid ones, so they would be dropped in arbitrary order.

3 Likes

This is my fault for not being clearer in original post. I would like to preserve ordering for the elements that are kept in the Vec.

Preserving the order of the retained elements is possible with a swap-and-truncate approach, but the removed ones will get jumbled before they're dropped.

3 Likes

There is a way to preserve the order of both the drained elements and the retained elements in linear time and constant space using only swaps: Stable minimum space partitioning in linear time.

It's simpler to do this using unsafe code that leaves a gap of uninitialized memory during the process, just as drain_filter does.

1 Like

It's actually possible to have O(1) space with into_iter + filter_map + collect. Playground. Specialization does wonders here. I think this behaviours is not guaranteed, but hey, it works!

1 Like

I am not entirely convinced of this.

Is there a moment in time where into_iter()....collect() prevents the old Vec from being freed, and yet the new one has already been constructed ?

If not, can you please explain why ?

2 Likes

Actually, the old vec is freed and a new one is constructed, but the storage is (in this case) reused:

  • when you call into_iter, the ownership of vec's storage is moved to IntoIter struct and is not dropped until the whole iterator chain is dropped.
  • collect, or more precisely FromIterator is specialized for Vec when following conditions are satisfied:
    • source iterator is vec::IntoIter,
    • the source type and destination type have the same layout
    • the only operations in the chain are map/filter/filter_map/enumerate, etc. – namely only the ones that cannot result in more destination elements than in source.

In such case specialization kicks in, that basically implements the in-place-filtering algorithm with reader index + writer index.

2 Likes

Does this mean that the Vec resulting from collect() will have large capacity allocated even if the number of elements collected after filtering is small?

Edit: Tested. Indeed it does. I think that's pretty bad. A very surprising waste of space. Rust Playground

Could you point me at the source code for this? I find this claim simultaneously believable yet surprising.

Hah, actually I've tried to add some source code link in previous post, but couldn't find anything quickly enough :upside_down_face: . I believe this is a good file to look at, also this test. Seems to be powered by InPlaceIterable and SourceIter traits.

Yup, I guess that's one of the reasons this behaviour is not guaranteed to stay there.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.