Vec::retain by index

soumya92 · April 27, 2019, 8:22pm

Is there an easy alternative to Vec::retain that operates on indices rather than contents?
Concretely, I have a second Vec that indicates whether elements in the original vec should be kept or dropped, and currently my code looks something like:

let items: Vec<Thing> = myvec
    .iter()
    .enumerate()
    .filter(|(idx, _)| othervec[*idx])
    .map(|(_, val)| val)
    .collect();

Is there something that would let me write myvec._____(|i| othervec[*i]);

cuviper · April 27, 2019, 8:43pm

It's a little cumbersome, but you can do this with your own local i that you increment manually.

jonh · April 27, 2019, 8:48pm

Track index yourself works with retain for the task you show.

let mut index = 0;
myvec.retain(|_| { index+=1; othervec[index-1] } );

Alternative to your iterator would be to use zip so not having to use indexes directly.

Yandros · April 27, 2019, 8:56pm

That looks like the best option to me:

let item: Vec<&_> =
    myvec
        .iter()
        .zip(&othervec)
        .filter_map(|(x, &b)| if b { Some(x) } else { None })
        .collect();

soumya92 · April 27, 2019, 9:13pm

Thanks for the suggestions. I'm worried that collect() will end up copying/allocating more than necessary, but maybe that can be avoided using into_iter instead of iter (not sure).

I like that using a separate index within retain works, but it just feels a bit off, since the closure is no longer pure, and now it relies on being called in a specific order.

I think for this specific case, writing my own loop that tracks offset and truncates the vector at the end works best. I'll probably just copy the implementation of retain from std and change it to use indices.

OptimisticPeach · April 27, 2019, 11:12pm

In general,

iter(&'_ self) will return impl Iterator<Item = &'_ T>
into_iter(self) will return impl Iterator<Item = T>
iter_mut(&'_ mut self) will return impl Iterator<Item = &'_ mut T>.

Therefore, into_iter will actually consume self and move the objects out of the collection, while iter will take reference to the objects in that collection (Unless you call .cloned() where Item: Clone). Depending on your situation, this may be okay, but it may also not. But, to clarify, for Item = T and size_of::<T>() <= size_of::<&()>(), into_iter may be equally or even more performant.

Note that this might wildly change due to compiler optimizations, but we don't really have much control

soumya92 · April 28, 2019, 12:45am

This is all great stuff to learn

I've managed to sidestep the problem for now by cloning the iterator (FilterMap<>) instead of collecting the result in a new vec, which IIUC should be even less work for the two loops I need over the resulting items:

let view = myvec
    .iter()
    .zip(&othervec)
    .filter_map(|(item, &vis)| if vis { Some(item) } else { None });

for item in view.clone() {
	//...
}
//...
for item in view.clone() {
	//...
}

scottmcm · April 28, 2019, 12:57am

I like this idea, but note that the retain docs don't guarantee an iteration order -- one could imagine a world where it first removes unwanted items from the end, for example. (There's also no guarantee that it only runs the predicate once per item, since it takes &T and thus could call multiple times.)

That said, I think it might as well guarantee the order, so if you rely on it I'd suggest making a PR for the docs to do that.

Yandros · April 29, 2019, 10:56am

@soumya92 given your use case clone-ing the iterator (made of 2 (fat) pointers only) is a great idea indeed!

(although you don't need to `.clone()` the iterator for the last iteration)

dcarosone · April 29, 2019, 11:07am

maybe I misunderstood, but something like

let retained = othervec
  .iter()
  .map(|i| myvec[i])
  .collect()

?

soumya92 · April 29, 2019, 8:38pm

Oh, I didn't elaborate enough in the original question. What I meant was I have two Vecs of the same size, with othervec being Vec<bool>. So the data looks like:

mymec = [1,2,3,4,5]; // for example, but with a much larger struct as data.
othervec = [true, false, true, true, false];

and I wanted myvec = [1,3,4].

dcarosone · April 29, 2019, 9:56pm

Ah, ok. Then yes, you want (for the simple version) zip and filter or filter_map, especially if you can keep it as an iterator rather than having to actually copy or modify the source vector(s), as your solution above.

For a more 'sophisticated' version, that really does have to drop the unwanted elements, something like a loop that walks from the front of the vector(s) and:

truncates both vectors to trim off any trailing unwanted entries
does a swap_remove of the next unwanted item from the front (swapping it with the must-now-be-wanted tail element)
terminates when it falls off the (now shorter) end

Honestly, whether this or just collect on the first version is better or faster will depend on a lot of different factors. In other words, be wary of the 'sophisticated' solution. If the elements are really large and expensive, and there are a lot that come and go dynamically as you do many of these manipulations, a Vec<Box<T>> or Vec<Rc<T>> or even a HashMap might work out favourably - keep the simpler, clearer iterator logic and have much less overhead for memory copying.

Finally, though it may be obvious: how is the Vec<bool> created in the first place? Some other previous pass through the data? Is this something that can be changed so that it could be lazily evaluated as part of a retain closure?

soumya92 · April 29, 2019, 11:13pm

To add some more background, this is basically an attempt at fitting elements into a constrained storage space (think, e.g. printing a report, where the page size is fixed), but in a priority order.

I start with vec![false; myvec.len()];, and then basically try to fit an element. If it fits, I mark it visible, subtract from available space, and move on. Otherwise I break, or I try to fit a shorter version of that information instead.

So myvec would be something like [date_short, date_long, tab, title_short, title_long, tab, page_number, "of", total_pages];. And then I would start with showing title_long. If it fits, I set othervec[4] = true and move on to date. Otherwise I try showing title_short, and set othervec[3] = true.

Once I've finished trying to fit the elements, it would be convenient to just discard the ones I didn't use, hence this question.

cuviper · April 30, 2019, 4:10am

https://github.com/rust-lang/rust/pull/60396

system · July 29, 2019, 4:18am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[Easy] More Rustic syntax for retaining values and indices of Vec help	6	473	September 5, 2021
Vec retain give an &T but could expose a &mut T help	5	579	April 21, 2021
More flexible `Vec::retain`	8	431	September 18, 2024
How to delete element when iterating a vec?	7	15578	January 11, 2022
Faster alternative to Vec's retain method help	2	613	January 12, 2023

Vec::retain by index

Note that this might wildly change due to compiler optimizations, but we don't really have much control

(although you don't need to .clone() the iterator for the last iteration)

Related topics

(although you don't need to `.clone()` the iterator for the last iteration)