Vec::retain by index

Is there an easy alternative to Vec::retain that operates on indices rather than contents?
Concretely, I have a second Vec that indicates whether elements in the original vec should be kept or dropped, and currently my code looks something like:

let items: Vec<Thing> = myvec
    .iter()
    .enumerate()
    .filter(|(idx, _)| othervec[*idx])
    .map(|(_, val)| val)
    .collect();

Is there something that would let me write myvec._____(|i| othervec[*i]);

1 Like

It's a little cumbersome, but you can do this with your own local i that you increment manually.

Track index yourself works with retain for the task you show.

let mut index = 0;
myvec.retain(|_| { index+=1; othervec[index-1] } );

Alternative to your iterator would be to use zip so not having to use indexes directly.

2 Likes

That looks like the best option to me:

let item: Vec<&_> =
    myvec
        .iter()
        .zip(&othervec)
        .filter_map(|(x, &b)| if b { Some(x) } else { None })
        .collect();

Thanks for the suggestions. I'm worried that collect() will end up copying/allocating more than necessary, but maybe that can be avoided using into_iter instead of iter (not sure).

I like that using a separate index within retain works, but it just feels a bit off, since the closure is no longer pure, and now it relies on being called in a specific order.

I think for this specific case, writing my own loop that tracks offset and truncates the vector at the end works best. I'll probably just copy the implementation of retain from std and change it to use indices.

In general,

  • iter(&'_ self) will return impl Iterator<Item = &'_ T>
  • into_iter(self) will return impl Iterator<Item = T>
  • iter_mut(&'_ mut self) will return impl Iterator<Item = &'_ mut T>.

Therefore, into_iter will actually consume self and move the objects out of the collection, while iter will take reference to the objects in that collection (Unless you call .cloned() where Item: Clone). Depending on your situation, this may be okay, but it may also not. But, to clarify, for Item = T and size_of::<T>() <= size_of::<&()>(), into_iter may be equally or even more performant.

Note that this might wildly change due to compiler optimizations, but we don't really have much control :man_shrugging:

This is all great stuff to learn :slight_smile:

I've managed to sidestep the problem for now by cloning the iterator (FilterMap<>) instead of collecting the result in a new vec, which IIUC should be even less work for the two loops I need over the resulting items:

let view = myvec
    .iter()
    .zip(&othervec)
    .filter_map(|(item, &vis)| if vis { Some(item) } else { None });

for item in view.clone() {
	//...
}
//...
for item in view.clone() {
	//...
}
3 Likes

I like this idea, but note that the retain docs don't guarantee an iteration order -- one could imagine a world where it first removes unwanted items from the end, for example. (There's also no guarantee that it only runs the predicate once per item, since it takes &T and thus could call multiple times.)

That said, I think it might as well guarantee the order, so if you rely on it I'd suggest making a PR for the docs to do that.

4 Likes

@soumya92 given your use case clone-ing the iterator (made of 2 (fat) pointers only) is a great idea indeed!

(although you don't need to .clone() the iterator for the last iteration)
1 Like

maybe I misunderstood, but something like

let retained = othervec
  .iter()
  .map(|i| myvec[i])
  .collect()

?

Oh, I didn't elaborate enough in the original question. What I meant was I have two Vecs of the same size, with othervec being Vec<bool>. So the data looks like:

mymec = [1,2,3,4,5]; // for example, but with a much larger struct as data.
othervec = [true, false, true, true, false];

and I wanted myvec = [1,3,4].

Ah, ok. Then yes, you want (for the simple version) zip and filter or filter_map, especially if you can keep it as an iterator rather than having to actually copy or modify the source vector(s), as your solution above.

For a more 'sophisticated' version, that really does have to drop the unwanted elements, something like a loop that walks from the front of the vector(s) and:

  • truncates both vectors to trim off any trailing unwanted entries
  • does a swap_remove of the next unwanted item from the front (swapping it with the must-now-be-wanted tail element)
  • terminates when it falls off the (now shorter) end

Honestly, whether this or just collect on the first version is better or faster will depend on a lot of different factors. In other words, be wary of the 'sophisticated' solution. If the elements are really large and expensive, and there are a lot that come and go dynamically as you do many of these manipulations, a Vec<Box<T>> or Vec<Rc<T>> or even a HashMap might work out favourably - keep the simpler, clearer iterator logic and have much less overhead for memory copying.

Finally, though it may be obvious: how is the Vec<bool> created in the first place? Some other previous pass through the data? Is this something that can be changed so that it could be lazily evaluated as part of a retain closure?

To add some more background, this is basically an attempt at fitting elements into a constrained storage space (think, e.g. printing a report, where the page size is fixed), but in a priority order.

I start with vec![false; myvec.len()];, and then basically try to fit an element. If it fits, I mark it visible, subtract from available space, and move on. Otherwise I break, or I try to fit a shorter version of that information instead.

So myvec would be something like [date_short, date_long, tab, title_short, title_long, tab, page_number, "of", total_pages];. And then I would start with showing title_long. If it fits, I set othervec[4] = true and move on to date. Otherwise I try showing title_short, and set othervec[3] = true.

Once I've finished trying to fit the elements, it would be convenient to just discard the ones I didn't use, hence this question.

https://github.com/rust-lang/rust/pull/60396

3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.