Iterators over for..in

gomain · March 4, 2021, 9:47am

I have this function that counts frequencies of case-insensitive-letters of some text.

fn frequency(text: &[&str]) -> HashMap<char, usize> {
    let mut freq = HashMap::new();
    for line in text {
        for ch in line.to_lowercase().chars() {
            if ch.is_alphabetic() {
                freq.entry(ch).and_modify(|cnt| *cnt += 1).or_insert(1);
            }
        }
    }
    freq
}

I want to rewrite this in iterator style. So I (try to) do

    text.into_iter()
        .flat_map(|line| line.to_lowercase().chars()) // bug!
        .filter(|ch| ch.is_alphabetic())
        .fold(HashMap::new(), |mut freq, ch| {
            freq.entry(ch).and_modify(|cnt| *cnt += 1).or_insert(1);
            freq
        })

This does not compile because of a reference to temporary value in the closure to flat_map! Here's one that does compile

    text.into_iter()
        .fold(HashMap::new(), |freq, line| {
            line.to_lowercase()
                .chars()
                .filter(|ch| ch.is_alphabetic())
                .fold(freq, |mut freq, ch| {
                    freq.entry(ch).and_modify(|cnt| *cnt += 1).or_insert(1);
                    freq
                })
            })

but this performs significantly slower than the original loop version.

There were so many claims that iterators would outperform hand rolled loops. Am I missing an angle here?

Sykout · March 4, 2021, 10:41am

The reason that your first attempt resulted in a compiler error is because to_lowercase() actually returns an owned String. So when you returned an iterator of chars to the string in the flat_map, it would mean that the String returned from to_lowercase() would be free, as the function has ended and the owner has not been transfer.
This will result in a dangling pointer, hence the compiler error.

But regarding your second problem, I did a test on the rust playground, and found that the iterator version is actually faster by around 10%. So the first question is, are you running it with the --release flag with cargo run --release ?

But really, if you want this to be faster, I would look into using char::to_lowercase() or char::to_ascii_lowercase(), as you can avoid that allocation, which is where most of the time is taking. The following example runs around 2x the allocated version in the example within the playground.

fn frequency_notalloc_iter(text: &[&str]) -> HashMap<char, usize> {
    text.into_iter()
        .flat_map(|line| line.chars())
        .map(|ch| ch.to_ascii_lowercase())
        .filter(|ch| ch.is_ascii_lowercase())
        .fold(HashMap::new(), |mut freq, ch| {
            freq.entry(ch).and_modify(|cnt| *cnt += 1).or_insert(1);
            freq
        })
}

2e71828 · March 4, 2021, 10:46am

My instinct, which may be overkill in this case, is to define a frequency-counting map type. I haven't tested the performance, though.

use std::collections::HashMap;
use std::hash::Hash;
use std::iter::FromIterator;

fn frequency(text: &[&str]) -> FreqMap<char> {
    text.iter().copied()
        .flat_map(str::chars)
        .flat_map(char::to_lowercase)
        .filter(|c| c.is_alphabetic())
        .collect()
}

pub struct FreqMap<T>(HashMap<T,usize>);

impl<T:Hash+Eq> FreqMap<T> {
    pub fn get(&self, t:&T)->usize {
        self.0.get(t).cloned().unwrap_or(0)
    }
}

impl<T> Default for FreqMap<T> {
    fn default()->Self { FreqMap(Default::default()) }
}

impl<T> FromIterator<T> for FreqMap<T> where Self: Extend<T> {
    fn from_iter<I>(iter: I) -> Self where
        I: IntoIterator<Item = T>,
    {
        let mut result:Self = Default::default();
        result.extend(iter);
        result
    }
}

impl<T:Hash+Eq> Extend<T> for FreqMap<T> {
    fn extend<I>(&mut self, iter: I) where
        I: IntoIterator<Item = T>,
    {
        for t in iter {
            *self.0.entry(t).or_default() += 1;
        }
    }
}

impl<'a, T:Hash+Eq+Clone+'a> Extend<&'a T> for FreqMap<T> {
    fn extend<I>(&mut self, iter: I) where
        I: IntoIterator<Item = &'a T>,
    {
        // TODO: Rewrite to avoid unnecessary clones
        self.extend(iter.into_iter().cloned());
    }
}

(Playground)

Sykout · March 4, 2021, 10:56am

2e71828:

Sykout:
        .filter(|ch| ch.is_ascii_lowercase())
        .map(|ch| ch.to_ascii_lowercase())
These are in the wrong order; the filter will reject non-lowercase characters before they get to the map.

Opps, indeed.
I've fixed my example.

gomain · March 4, 2021, 4:51pm

Yes I understand the bug. It was included to lead into the next solution. I'm not for performance per se, but rather astonished that I couldn't write an iterator solution that should have outperformed a same-logic-loop-solution.

Your playground time assessments are not accurate. Had you swapped the order you would get different results.

gomain · March 4, 2021, 4:58pm

Thank you. This is nice and all. But does not address the issue. How could I write an iterator solution that captures the same logic and outperforms the loop solution? I Specifically called str::to_lowercase not char::to_lowercase.

2e71828 · March 4, 2021, 5:03pm

It's not necessarily possible. There's nothing magic about iterators, and Rust's for loops use them under the hood anyway. As far as I understand it, the way most iterator-based solutions gain performance is by short-circuiting unnecessary operations; I don't see how that could apply to this problem.

If any significant work is happening in the loop body (Allocating on the heap, for instance), the performance difference between iterators and loops is likely to be lost in the noise.

skysch · March 4, 2021, 5:07pm

Iterators, loops, and recursion are all mathematically equivalent, and can be transformed into each other as optimizations. The choice of representation has no inherent performance value, it's just notation. And on that front, once you start using non-trivial folds (or custom adapters), you're really stretching the limits of where iterators are considered a good notation.

system · June 2, 2021, 5:07pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Help with adding iterators to `search_case_insensitive` from the book help	1	670	January 28, 2020
Iterator methods not acting as I expect help	6	482	November 8, 2019
Testing exercises/standard_library_types/iterators2.rs code review	3	619	August 29, 2020
I can't `match` into two different iterators help	28	1179	December 21, 2022
Iterators: shorten call to map help	5	1094	January 12, 2023

Iterators over for..in

Related Topics