I have a HashSet that I'd like to push values into using this closure. Additionally, there's a bug in the for_each where insert returns a bool that I'm not using, but I'm more curious how to modify the closure to accept a reference to my HashSet, rather than try to copy it.
use std::collections::HashSet;
fn e32_i() -> u32 {
let mut s: HashSet<u32> = HashSet::new();
(12..99)
.filter(|x| is_pandigital(format!("{}", x)))
.flat_map(|x| {
(123..988)
.filter(|y| is_pandigital(format!("{}", y)))
.filter(move |y| x * y / 10000 == 0)
.filter(move |y| is_pandigital(format!("{}{}{}", x, y, x * y)))
.for_each(move |y| s.insert(x * y))
});
s.iter().sum()
}
fn is_pandigital(s: String) -> bool {
let mut res = true;
for (i, v) in s.chars().enumerate() {
if s[i + 1..].contains(v) {
res = false;
break;
}
}
res
}
This compiler suggests that the Set should be type copy for me to do this, which I think I understand: the closure copies the environment variable, but can't do that with the Set (as HashSet does not implement Copy).
Whenever a closure is move, it will take ownership of the "names" it captures. So the trick is to make the name be, in and on itself, something that does not own the Set:
.for_each({
let s = &mut s; // now `s` is a binding that does not own the set, it just borrows it
move |y| { s.insert(x * y); } // use braces with a semicolon to discard the `bool`
})
I'm learning a lot from how you refactored my code. I have some questions regarding some of the things you did.
Refactoring my is_pandigital function:
You rewrote the function to accept a reference to a string. I think you did this because it's:
Idiomatic: Rustaceans seem to prefer &str to String when I read their code, meaning there's probably compiler optimizations around that type or something, though I'm not explicitly certain why you did this.
You used the &'_ lifetime pattern, which I don't often see and don't understand, but it's actually pretty difficult to search for what that pattern is. If I remove it altogether, the code still compiles. Alternatively, I can explicitly elide the function, fn blah <'a>(s: &a str) -> bool`. Could you elaborate how you chose that pattern?
It doesn't match any of patterns I'm aware of.
You also used the all and not methods, which I was unaware of. These are experienced things to do and good things to know about, but less to me interesting tautologically.
Avoiding mutable state vars:
Twice you avoided using mutable state variables I used. I suspect this is a functional programming move to kill mutable state wherever it arises. I suspect it's also idiomatic for Rust.
One of my observations in this style is that it can be difficult for me to incrementally design and test chains of functions; I think there were around seven chained functions in that assignment. Since I can't directly watch you program, I'm curious what your process steps are in chaining functions and testing whether your chain is valid, especially when the layers become a bit more nuanced. I often find myself getting lost in that process.
.iter to .into_iter:
It looks like you refactored the last line. I'm not sure why. Both compile. Into would consume the set. Am I correct in interpreting that without using into, the Set would continue to live for the entire lifetime of whatever receives the result of e32_i() in main?
Bonus round! Suppose I'm not done with the set. I have more stuff to shove in it later (say, a similar product of pandigital multiples of a different set of ranges). Is there a better solution than this one which lets s be mutable? Note, that one has errors, I may update it shortly.
Update, this one actually compiles, but is pretty wasteful, and throws a mutable variable back in the mix.
I claim that this is all pretty ugly, so I also wrote a solution that does all that in a pair of for loops. Comparing the two, the imperative solution is much simpler IMO, allowing that we're doing things with a mutable HashSet. Is that because I walked down some weird functional paths or is this a problem that lends itself to mutability?
In Rust the two main "text" types are String and &'some_lifetime str. String represents
an owned "handle" over the
heap-allocatedstr contents
which (thus) happen to be mutable (e.g., you can .push_str() to a String).
&'some_lifetime str, on the other hand, represents
a borrowed (for a limited 'lifetime) sharedview over
anywhere-(al)located str contents
which (thus) happen to be immutable
So you use String when you need ownership (and maybe mutation) over the str contents, so that you can keep them around for as long as you like (e.g., forever), without the compiler complaining that the str may not live long enough; you use &'borrow mut String when you just want to mutate a string you are temporarily borrowing, and &'borrow str in all the other cases, i.e., when you just want to read / scan the string and don't care about keeping it afterwards, which was the case for you is_pandigital function.
Indeed, if you use String you are affectively taking ownership of the string you are given, meaning the caller cannot use that string anymore, since your function is (implicitly) dropping (and thus heap-freeing) it when it returns. If you use &'borrow mut String or &'borrow String, then you avoid this problem, but are still requiring that the string you are scanning be a heap-allocated one (in the former case to be able to mutate it, which makes sense if you need to do it, and in the latter case for no special reason except being annoying ). This means that a string literal such as "Hello!" (which is a &'static str, i.e., a infinitely-lived borrow over some string contents that happen to live in the static / global memory of the binary) would not be a valid input for a function taking a &'borrow [mut] String. That's why &'borrow str is the most versatile input when just wanting to scan the input (at the cost of requiring a leading & in front of a format! call, which can be circumvented with a more complex function signature argument, impl AsRef<str>, but really, that's just nitpicking; since Rust likes to be explicit about ownership vs. borrows having to add leading & here and there should not be an issue).
'_ is the anonymous lifetime. Using it in a case like yours is rather my opinionated stance; many people would suggest on the contrary to just write &str to keep things "simple". I, however, think otherwise: Rust is a language that allows us to elide some unnecessary copies through the process of borrowing, which means "copy"ing the address of an object rather than the object itself. This is something that low level languages such as C thrived on doing. Too much. Indeed, if the initial object is moved or deallocated, then all these pointers dangle to "garbage memory" and all things can go wrong. Rust allows the best of all worlds with the mechanism of borrows: a borrow is like a pointer, but with a reduced area of statically (i.e., at compile-time) guaranteed "non-dangling-ness": the lifetime of the borrow.
This means that whenever you meet a type such as &SomeThing or &mut SomeThing, such type is actually a lie. There is no &[mut] SomeThing type. The & "type constructors" takes not one but two parameters: the type of the borrowee / borrowed thing, and a lifetime: &'some_lifetime [mut] SomeThing. Not writing the lifetime is just an elision of that lifetime parameter, a way of being implicit "for the sake of readability".
So the type of is_pandigital function is rather, as you mentioned:
which is indeed explicit, but it is also indeed cumbersome.
That's what the '_ lifetime is for: we get to be explicit about there being a lifetime, while also gently "dismissing it" as "accounted for":
fn is_pandigital (name: &'_ str) -> bool
Regarding .all(), and other such ergonomic methods, I agree that these are just "encyclopedic knowledge" and thus not as much conceptually interesting. But it is always nice to discover some of these "functiona programming" gems. The one you could look for is .fold(): you can construct most iterator adaptors using it, although some may be less efficient because .fold() is not short-circuiting (for that, you should look for .try_fold(): you should be able to implement any iterator adaptor with it).
Rust is less allergic to mutable state than other languages; I, for instance, usually go for mutable state and .for_each() rather than "immutably" fold()ing. But it turns out that mutable-state-based patterns can be written in a less optimal way than immutable ones: calling .collect() once, for instance, instead of .for_each(... .push() / .insert() / etc.) is the perfect example w.r.t.reallocations. Indeed, if you take a collection that is able to pre-allocate (through .reserve(), or through the ::with_capacity() constructor), you get to directly insert the elements you are appending in place, and not move / copy them anymore. This is something that .collect(), under the hood, does for you. Whereas when you do not preallocate and just use a ::new() constructor like you did; the collection cannot know how much capacity it needs to pre-allocate, and thus may exceed its capacity in the middle of the iteration, requiring a reallocation and thus a bitwise copy of the already written elements to the new location. And this can happen again and again (most algorithms choose to double the capacity each time, so as to make the amortized cost be logarithmic; but this is still worse than the "constant" time of using ::with_capacity()-based (pre)allocation, or the .collect() construct).
.iter()vs..into_iter()
You can dismiss / ignore that change, as it doesn't change much for your code. You are right that with .iter(), much like with using a &'_ str parameter explained above, you get to avoid taking ownership of the collection during the iteration, by borrowing each element. This can be both good and bad, hence my saying there is not one that is strictly better than the other.
The rules of thumb (in order of importance), would be:
If you do not use the collection afterwards, feel free to .into_iter().
Mainly: if you do not use the collection afterwards, you can save / avoid .clone()ing each value (when needing ownership of the values) by using .into_iter() for the iteration.
If you do, then do not use .clone() before the .into_iter() call to avoid "losing ownership". Instead, do use .iter() or .iter_mut() for borrowed-iteration.
And if you do need owned values during the iteration, then you can .clone() each value; or directly use the .cloned() iterator adaptor. In other words, .iter().cloned() ≥ .clone().into_iter() (and the only case where they are equal that I can think of is when the collection is a Vec, and the iteration is directly followed by .collect()ing into a second Vec, thanks to Vec optimizing .into_iter().collect() into a no-op).