I want to iterate over a string by two characters, in other words on each iteration I want to consume two characters from string. I tried to use step_by
and peekable
over chars()
method but I did not success. What is the best method for consuming two chars on each iteration over a string?
The itertools crate has several useful methods for this, such as Itertools::tuples
. Example:
use itertools::Itertools;
fn main() {
for (a, b) in "Hello world!".chars().tuples() {
println!("{} {}", a, b);
}
}
You can also implement this fairly easily without a dependency and without confining the iterator item to tuples of any maximum length, by writing an iterator that is generic over arrays instead: Playground.
fn by_chars<'a, T>(s: &'a str) -> impl Iterator<Item=T> + 'a
where
T: Default + 'a,
for<'b> &'b mut T: IntoIterator<Item=&'b mut char>
{
struct NChars<'c, U> {
inner: std::str::Chars<'c>,
_phantom: std::marker::PhantomData<U>
}
impl<'c, U> Iterator for NChars<'c, U>
where
U: Default + 'c,
for<'d> &'d mut U: IntoIterator<Item=&'d mut char>
{
type Item = U;
fn next(&mut self) -> Option<Self::Item> {
let mut item = U::default();
for ptr in &mut item {
*ptr = self.inner.next()?;
}
Some(item)
}
}
NChars {
inner: s.chars(),
_phantom: std::marker::PhantomData,
}
}
fn main() {
for chunk in by_chars::<[char; 2]>("Hello World!") {
println!("{} {}", chunk[0], chunk[1]);
}
}
You can also adjust this so that it returns a subslice into the strings instead of an array of char
, like this:
/// `n: NonZeroUsize` because one can't iterate by 0 chars
fn by_chars(s: &str, n: NonZeroUsize) -> impl Iterator<Item=&str> {
struct NChars<'c> {
inner: std::str::Chars<'c>,
n: usize,
}
impl<'c> Iterator for NChars<'c> {
type Item = &'c str;
fn next(&mut self) -> Option<Self::Item> {
let string = self.inner.as_str();
let len = self.inner.by_ref().take(self.n).map(char::len_utf8).sum();
if len > 0 {
Some(&string[..len])
} else {
None
}
}
}
NChars {
inner: s.chars(),
n: n.into(),
}
}
Incidentally, what are you using this for? Remember that char
does not correspond to a human-readable character; it's a Unicode code point instead. If you want to iterate human-readable "characters", you'd have to use something like str::grapheme_clusters()
from the unicode_segmentation
crate, and a very slight variation on the above piece of code.
Perhaps a combination of windows
and step_by
:
let iter = slice.windows(2).step_by(2);
...even better from @cuviper: slice.chunks_exact(2)
.
If the window and step size are the same, you can use chunks_exact(2)
.
(Assuming you have a slice, but I think the OP does not.)
Even better.
It's possible the conversion to slice is an issue. But it did seem like something to consider anyway.