Mixing `step_by` with `peekable` on iterators

I want to iterate over a string by two characters, in other words on each iteration I want to consume two characters from string. I tried to use step_by and peekable over chars() method but I did not success. What is the best method for consuming two chars on each iteration over a string?

The itertools crate has several useful methods for this, such as Itertools::tuples. Example:

use itertools::Itertools;

fn main() {
    for (a, b) in "Hello world!".chars().tuples() {
        println!("{} {}", a, b);
    }
}

(Playground)

1 Like

You can also implement this fairly easily without a dependency and without confining the iterator item to tuples of any maximum length, by writing an iterator that is generic over arrays instead: Playground.

fn by_chars<'a, T>(s: &'a str) -> impl Iterator<Item=T> + 'a
    where
        T: Default + 'a,
        for<'b> &'b mut T: IntoIterator<Item=&'b mut char>
{
    struct NChars<'c, U> {
        inner: std::str::Chars<'c>,
        _phantom: std::marker::PhantomData<U>
    }
    
    impl<'c, U> Iterator for NChars<'c, U>
        where
            U: Default + 'c,
            for<'d> &'d mut U: IntoIterator<Item=&'d mut char>
    {
        type Item = U;
        
        fn next(&mut self) -> Option<Self::Item> {
            let mut item = U::default();
            
            for ptr in &mut item {
                *ptr = self.inner.next()?;
            }
            
            Some(item)
        }
    }
    
    NChars {
        inner: s.chars(),
        _phantom: std::marker::PhantomData,
    }
}

fn main() {
    for chunk in by_chars::<[char; 2]>("Hello World!") {
        println!("{} {}", chunk[0], chunk[1]);
    }
}

You can also adjust this so that it returns a subslice into the strings instead of an array of char, like this:

/// `n: NonZeroUsize` because one can't iterate by 0 chars
fn by_chars(s: &str, n: NonZeroUsize) -> impl Iterator<Item=&str> {
    struct NChars<'c> {
        inner: std::str::Chars<'c>,
        n: usize,
    }
    
    impl<'c> Iterator for NChars<'c> {
        type Item = &'c str;
        
        fn next(&mut self) -> Option<Self::Item> {
            let string = self.inner.as_str();
            let len = self.inner.by_ref().take(self.n).map(char::len_utf8).sum();

            if len > 0 {
                Some(&string[..len])
            } else {
                None
            }
        }
    }
    
    NChars {
        inner: s.chars(),
        n: n.into(),
    }
}

Incidentally, what are you using this for? Remember that char does not correspond to a human-readable character; it's a Unicode code point instead. If you want to iterate human-readable "characters", you'd have to use something like str::grapheme_clusters() from the unicode_segmentation crate, and a very slight variation on the above piece of code.

5 Likes

Perhaps a combination of windows and step_by:

let iter = slice.windows(2).step_by(2);

...even better from @cuviper: slice.chunks_exact(2).

Playground

1 Like

If the window and step size are the same, you can use chunks_exact(2).
(Assuming you have a slice, but I think the OP does not.)

2 Likes

Even better.

It's possible the conversion to slice is an issue. But it did seem like something to consider anyway.