Hello, beginner here. I'm writing a program that has two string vectors, and I need to remove any string that's in both lists from both lists, leaving just the unique values. The lists are sorted beforehand.
After doing some research, I came up with the following function, which seems to be working:
fn main() {
let mut v1 = vec!["A1.2", "A2.2", "B2.1", "CR1.3", "D1.2", "U10.76"];
let mut v2 = vec!["A1.2", "A2.4", "B2.1", "CR1.A", "D1.K", "U10.76"];
// remove: **** **** ******
remove_matches(&mut v1, &mut v2);
assert_eq!(v1, vec!["A2.2", "CR1.3", "D1.2"]);
assert_eq!(v2, vec!["A2.4", "CR1.A", "D1.K"]);
}
fn remove_matches(v1: &mut Vec<&str>, v2: &mut Vec<&str>) {
let mut len2 = v2.len();
let mut i = 0;
while i < v1.len() {
let word = v1[i].clone();
v2.retain(|s| s != &word);
if v2.len() < len2 {
v1.retain(|s| s != &word);
len2 = v2.len();
} else {
i += 1;
}
}
}
Since I'm still getting used to chaining/closures, as well as the std methods available, I was wondering if there was a better way to implement this functionality.
This isn't quite right— it doesn't seem to handle duplicate items correctly. In particular, remove_matches(&mut vec!["";1024], &mut vec!["";1024]) leaves half of the items in both vectors instead of removing them all.
It also does O(n²) comparisons, which isn't great for large vectors.
I'd probably start with something like this to take advantage of the lists being presorted. There's still room for optimization here, but it's easy to understand and the asymptotic performance is at reasonable:
fn remove_matches(v1: &mut Vec<&str>, v2: &mut Vec<&str>) {
let mut v1_iter = std::mem::take(v1).into_iter().peekable();
let mut v2_iter = std::mem::take(v2).into_iter().peekable();
loop {
match (v1_iter.peek(), v2_iter.peek()) {
(None, None ) => return,
(Some(_), None ) => v1.extend(&mut v1_iter),
(None, Some(_)) => v2.extend(&mut v2_iter),
(Some(a), Some(b)) => {
use std::cmp::Ordering::*;
match a.cmp(b) {
Less => v1.push(v1_iter.next().unwrap()),
Greater => v2.push(v2_iter.next().unwrap()),
Equal => {
let _ = v1_iter.next();
let _ = v2_iter.next();
}
}
}
}
}
}
The test case does use sorted lists, and so does the problem statement:
Also, the problem statement doesn't state that the lists have been deduplicated, so that's not really something a correct solution can rely on.
I'm not surprised by those results at all, as the test lists are quite short and I didn't spend any time doing things like avoiding allocations. If you increase the list lengths a bit, the difference becomes more obvious:
For completeness, here's a testcase where this code misses to remove a duplicate, because it works with equal numbers of duplicates, only:
let v1 = vec!["A1.2", "A2.2", "B2.1", "B2.1", "CR1.3", "D1.2", "U10.76"];
let v2 = vec!["A1.2", "A2.4", "B2.1", "CR1.A", "D1.K", "U10.76"];
Also, none of these three versions removes or deduplicates duplicates appearing in only one vector, like
let v1 = vec!["A1.2", "A2.2", "A2.2"];
let v2 = vec!["A1.2", "A2.4"];
That said, I agree that "complex" code works quite well for big arrays. Taking sorting into account with "simple" code, I can get it faster for small arrays and as fast, but not faster for big arrays.