Nice! I had an answer partially finished which I'm going to go ahead and post. You may or may not want to use parts of it.
Slightly adapted from an answer to this Stack Overflow question:
fn str_windows(line: &str, n: usize) -> impl Iterator<Item = &str> {
line.char_indices()
.zip(line.char_indices().skip(n).chain(Some((line.len(), ' '))))
.map(move |((i, _), (j, _))| &line[i..j])
}
Especially since you've made a crate of it, I feel it's more usual to make behavior like this (which extends the capabilities of a type) part of a trait, and implment it for str
.
But I always feel the need to point out the caveats when dealing with char
s, because if you're not careful you might accidentally be invaded by Bulgaria.
let whole_str = "🇬🇧🇬🇧🇬🇧";
for substr in substrs(whole_str, 2) {
print!("{} ", substr);
}
The above prints 🇬🇧 🇧🇬 🇬🇧 🇧🇬 🇬🇧
. (For those with limited font rendering capability, these are flag emoji for the UK and Bulgaria. Input looks like UK-UK-UK and output is UK-Bulgaria-UK-Bulgaria-UK)
Iteration by grapheme clusters is usually the way to go for general purpose text wrangling, but that requires an external dependency:
extern crate unicode_segmentation;
use unicode_segmentation::UnicodeSegmentation;
fn str_windows(line: &str, n: usize) -> impl Iterator<Item = &str> {
line.grapheme_indices(true)
.zip(line.grapheme_indices(true).skip(n).chain(Some((line.len(), ""))))
.map(move |((i, _), (j, _))| &line[i..j])
}
(playground)
There may be a way to make unicode-segmentation
an optional dependency of your crate, and have e.g. str_windows_graphemes
when it's available. I've never published a crate, so I don't know.