Is there a better way to get windows of chars from &str?

I'm using Advent Of Code 2022 to learn Rust. For day 6 we need to find the first group of n characters that are all unique. My approach is to use windows to make HashSets of characters until there is a HashSet whose length is n. Here's what I finally got to work:

fn find_start(signal: &str, window_length: u32) -> Option<u32> {
    let chars: Vec<char> = signal.chars().collect();
    for (index, window) in chars.windows(window_length as usize).enumerate() {
        let char_set: HashSet<_> = HashSet::from_iter(window.iter());
        if char_set.len() == window_length as usize {
            return Some(index as u32 + window_length);
        }
    }

    None
}

(I know this isn't necessarily the most efficient algorithm but learning Rust is a higher priority than efficiency.)

My questions are:

  1. Is there are more idiomatic way of getting windows of the characters in &str than creating a Vec<char> then calling windows on that Vec?

  2. Is there a cleaner way of making the HashSet from the &[char] than HashSet::from_iter(window.iter())? It seems like a lot of iters.

If they're ASCII, you can use str.as_bytes() and then operate on slices.

You can also iterate on just .chars() and keep n previous ones separately from the iterator (in an array for example).

And then there's:

1 Like

I'd have to say that I think your approach is reasonable. I did it the same way:

For text stuff, it usually depends alot on whether you want to solve the problem or learn how to do it properly.

In competitive programming, they essentially never actually make you deal with unicode. So the most expedient way is just to deal in u8s because really when they say "character" in such problems they mean "C's char and only ever ASCII".

Doing it properly, though, means not using char either, and instead using something like https://docs.rs/unicode-segmentation/latest/unicode_segmentation/trait.UnicodeSegmentation.html#tymethod.graphemes.

5 Likes
  1. I used as_bytes() and I thought it was pretty clean. Full solution here
fn detect(signal: &str, n: usize) -> usize {
    signal
        .as_bytes()
        .windows(n)
        .position(|w| HashSet::from_iter(w).len() == n)
        .map(|i| i + n)
        .unwrap()
}

Besides itertools's tuple_windows another option is to use the iterwindows crate which provides array_windows. (Disclaimer: this is my crate.)

use iterwindows::IterArrayWindows;

fn detect<const N: usize>(signal: &str) -> usize {
    signal
        .chars()
        .array_windows::<N>()
        .position(|w| HashSet::from_iter(w).len() == N)
        .map(|i| i + N)
        .unwrap()
}
  1. By the way the argument to FromIterator::from_iter just has to implement IntoIterator so you don't need to actually call .iter() on the window :slight_smile:

If this was real-world input, I'd definitely be Unicode-aware. But this input is constrained and defined so I'm ok with treating it as 7-bit ASCII. Thanks for the link though. I'll keep that in mind when I need to process non-ASCII text.

That's great, thanks. I'm still at the "keep typing until it works" phase so I probably added .iter() before I did HashSet::from_iter() and didn't try removing it.

This is absolutely correct, and using the HashSet is the way to write this for AoC.

But for fun, here's a way to rewrite this on ASCII with just one allocation and no HashSet:

fn find_start(signal: &str, window_length: usize) -> Option<usize> {
    let bytes = signal.as_bytes();
    let mut buf = Vec::with_capacity(window_length);
    for (index, window) in bytes.windows(window_length).enumerate() {
        window.clone_into(&mut buf);
        buf.sort_unstable();
        buf.dedup();
        if buf.len() == window_length {
            return Some(index + window_length);
        }
    }

    None
}

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=6377a0f4c9cfed1f9345a0dae4830da2

Note that I switched it to usize instead of u32 -- might as well avoid all the casts.

1 Like