I'm very new to Rust so bear with me if this is a rookie/stupid question. Note also I come from a C++ background, so feel free to provide analogies with C++ if that helps.
I'm trying to implement a function which will take two (Hash)Sets A and B and do the following:
- Modify A -> A n (A\B)
- Return AnB (before A is modified)
Put another way, I want to remove everything in B from A and return what was removed (but don't assume that B is contained in A).
I'd like to know how to do this efficiently, i.e. I don't want to create lots of "copies" of objects (possibly I mean clones here in Rust language). In my mind it should be possible to basically split A into two parts and transfer ownership in such a way that no new allocations (copies/clones) are needed.
I tried this originally with .intersection
and .difference
and fell flat on my face. After being lost in the woods with references and lifetimes for a while I came to the conclusion that these methods wouldn't allow me to do what I want without needless copying.
The solution I came up with is below. I've simplified it a little as my original code had generics. I now just have HashSets of some custom struct SomeData
.
My questions are:
-
Is this as efficient as it can be?
-
Am I right in my conclusion that
.intersection
and.difference
wouldn't be able to help me here?#[derive(Eq, PartialEq, Hash, Debug, Clone)] struct SomeData { i: i32, } fn remove_values(all_values: &mut HashSet<SomeData>, remove_values: & HashSet<SomeData> ) -> HashSet<SomeData> { // Whatever way I do this, I am always going to have to create a new container for the return // values let mut removed_values: HashSet<SomeData> = HashSet::new(); for remove_me in remove_values.iter() { // My understanding here is that `take` will not just remove the value from all_values (if // it exists) it will also give me ownership of the value, i.e. C++ equivalent to move // semantics? if let Some(removed) = all_values.take(remove_me) { removed_values.insert(removed); } } removed_values }
As an extra note, one thing I tried to do was prove this was as efficient as I thought. I took my naive C++ brain and figured I could just print out pointer values and see what was in my sets before and after calling this method:
let mut a: HashSet<_> = [SomeData{i:1}, SomeData{i:2}, SomeData{i:3}, SomeData{i:4}, SomeData{i:5}].iter().cloned().collect();
let b: HashSet<_> = [SomeData{i:4}, SomeData{i:5}, SomeData{i:6}].iter().cloned().collect();
for aval in a.iter() {
println!("a before: {:?} {:p}", aval, aval);
}
for bval in b.iter() {
println!("b before: {:?} {:p}", bval, bval);
}
let x = remove_values(& mut a, &b);
for aval in a.iter() {
println!("a after: {:?} {:p}", aval, aval);
}
for xval in x.iter() {
// Expect the addresses here to have been originally in a
println!("removed: {:?} {:p}", xval, xval);
}
My expectation was that the "removed" pointers would have originally been in set a
but they weren't. After some reflection, I believe this makes sense. My bet is that this is really equivalent to C++ move semantics. The values in the sets are actually on the stack. Whilst my code might be efficient, there's basically a bunch of places in the code which create new stack values and "move" (transfer ownership) of the data in the SomeData
structs. So it's likely not a surprise that pointers are different.
If this is true then I'm not 100% sure how I could prove this any other way. My thought was to implement Clone
myself and log out clones. I saw none, which I assume means I'm only getting Copy
ies, which I think mean I'm only getting moves in C++ language.
Sorry if this is a little unclear, I'm still getting my head around a few things.
Thansk