Find duplicates in a collection implemented as vector

Yet another question: I have a vector of elements some of which may be equal to others. How to reduce this to a vector of unique elements? Alternatively, how to remove the duplicates?

Does your program have extra memory to put those elements into HashSet, or mutable access to the vector to sort it, or extra time to run a check for each pair of elements (i.e. quadratic asymptotic time)?

3 Likes

If you can spare O(n) memory then you use a HashSet to keep track of "seen" items and then rebuild the array from the HashSet.

If you want to do it without any extra memory overhead you can do it in O(n log n) by sorting the array, because then duplicate elements will always be grouped together. You see an element and then just chuck subsequent duplicate elements out until you start seeing a new element. This can be done in-place if you're clever but can also just make a new vector then discard the original which is easier to implement.

1 Like

Thanks to both posters. I conclude that there is no built-in unique function, such as in the programming languages REBOL and Red, and that I have to craft this function myself somehow.

1 Like

I can think of 2 ways.
First, and especially if there are some memory restrictions to sort the vec in place using sort_unstable_by or sort_unstable_by_key. Then either use Vec::dedup or manually going through the vec.

The second is to use the itertools crate.

3 Likes

Ah yes I forgot, if you sort the array ( vec.sort() ) you actually can just call vec.dedup() ! This will reorder the data of course.

I forgot about this method then remembered it and came back and saw @someotherself just beat me to it :slight_smile:

2 Likes

No reason to use a Vec for your case, this works better.