I have this function that counts frequencies of case-insensitive-letters of some text.
fn frequency(text: &[&str]) -> HashMap<char, usize> {
let mut freq = HashMap::new();
for line in text {
for ch in line.to_lowercase().chars() {
if ch.is_alphabetic() {
freq.entry(ch).and_modify(|cnt| *cnt += 1).or_insert(1);
}
}
}
freq
}
I want to rewrite this in iterator style. So I (try to) do
The reason that your first attempt resulted in a compiler error is because to_lowercase() actually returns an owned String. So when you returned an iterator of chars to the string in the flat_map, it would mean that the String returned from to_lowercase() would be free, as the function has ended and the owner has not been transfer.
This will result in a dangling pointer, hence the compiler error.
But regarding your second problem, I did a test on the rust playground, and found that the iterator version is actually faster by around 10%. So the first question is, are you running it with the --release flag with cargo run --release ?
But really, if you want this to be faster, I would look into using char::to_lowercase() or char::to_ascii_lowercase(), as you can avoid that allocation, which is where most of the time is taking. The following example runs around 2x the allocated version in the example within the playground.
Yes I understand the bug. It was included to lead into the next solution. I'm not for performance per se, but rather astonished that I couldn't write an iterator solution that should have outperformed a same-logic-loop-solution.
Your playground time assessments are not accurate. Had you swapped the order you would get different results.
Thank you. This is nice and all. But does not address the issue. How could I write an iterator solution that captures the same logic and outperforms the loop solution? I Specifically called str::to_lowercase not char::to_lowercase.
It's not necessarily possible. There's nothing magic about iterators, and Rust's for loops use them under the hood anyway. As far as I understand it, the way most iterator-based solutions gain performance is by short-circuiting unnecessary operations; I don't see how that could apply to this problem.
If any significant work is happening in the loop body (Allocating on the heap, for instance), the performance difference between iterators and loops is likely to be lost in the noise.
Iterators, loops, and recursion are all mathematically equivalent, and can be transformed into each other as optimizations. The choice of representation has no inherent performance value, it's just notation. And on that front, once you start using non-trivial folds (or custom adapters), you're really stretching the limits of where iterators are considered a good notation.