Is there an easy alternative to Vec::retain that operates on indices rather than contents?
Concretely, I have a second Vec that indicates whether elements in the original vec should be kept or dropped, and currently my code looks something like:
Thanks for the suggestions. I'm worried that collect() will end up copying/allocating more than necessary, but maybe that can be avoided using into_iter instead of iter (not sure).
I like that using a separate index within retain works, but it just feels a bit off, since the closure is no longer pure, and now it relies on being called in a specific order.
I think for this specific case, writing my own loop that tracks offset and truncates the vector at the end works best. I'll probably just copy the implementation of retain from std and change it to use indices.
Therefore, into_iter will actually consume self and move the objects out of the collection, while iter will take reference to the objects in that collection (Unless you call .cloned() where Item: Clone). Depending on your situation, this may be okay, but it may also not. But, to clarify, for Item = T and size_of::<T>() <= size_of::<&()>(), into_iter may be equally or even more performant.
Note that this might wildly change due to compiler optimizations, but we don't really have much control
I've managed to sidestep the problem for now by cloning the iterator (FilterMap<>) instead of collecting the result in a new vec, which IIUC should be even less work for the two loops I need over the resulting items:
let view = myvec
.iter()
.zip(&othervec)
.filter_map(|(item, &vis)| if vis { Some(item) } else { None });
for item in view.clone() {
//...
}
//...
for item in view.clone() {
//...
}
I like this idea, but note that the retain docs don't guarantee an iteration order -- one could imagine a world where it first removes unwanted items from the end, for example. (There's also no guarantee that it only runs the predicate once per item, since it takes &T and thus could call multiple times.)
That said, I think it might as well guarantee the order, so if you rely on it I'd suggest making a PR for the docs to do that.
Oh, I didn't elaborate enough in the original question. What I meant was I have two Vecs of the same size, with othervec being Vec<bool>. So the data looks like:
mymec = [1,2,3,4,5]; // for example, but with a much larger struct as data.
othervec = [true, false, true, true, false];
Ah, ok. Then yes, you want (for the simple version) zip and filter or filter_map, especially if you can keep it as an iterator rather than having to actually copy or modify the source vector(s), as your solution above.
For a more 'sophisticated' version, that really does have to drop the unwanted elements, something like a loop that walks from the front of the vector(s) and:
truncates both vectors to trim off any trailing unwanted entries
does a swap_remove of the next unwanted item from the front (swapping it with the must-now-be-wanted tail element)
terminates when it falls off the (now shorter) end
Honestly, whether this or just collect on the first version is better or faster will depend on a lot of different factors. In other words, be wary of the 'sophisticated' solution. If the elements are really large and expensive, and there are a lot that come and go dynamically as you do many of these manipulations, a Vec<Box<T>> or Vec<Rc<T>> or even a HashMap might work out favourably - keep the simpler, clearer iterator logic and have much less overhead for memory copying.
Finally, though it may be obvious: how is the Vec<bool> created in the first place? Some other previous pass through the data? Is this something that can be changed so that it could be lazily evaluated as part of a retain closure?
To add some more background, this is basically an attempt at fitting elements into a constrained storage space (think, e.g. printing a report, where the page size is fixed), but in a priority order.
I start with vec![false; myvec.len()];, and then basically try to fit an element. If it fits, I mark it visible, subtract from available space, and move on. Otherwise I break, or I try to fit a shorter version of that information instead.
So myvec would be something like [date_short, date_long, tab, title_short, title_long, tab, page_number, "of", total_pages];. And then I would start with showing title_long. If it fits, I set othervec[4] = true and move on to date. Otherwise I try showing title_short, and set othervec[3] = true.
Once I've finished trying to fit the elements, it would be convenient to just discard the ones I didn't use, hence this question.