Split without removing the character

I want to split a string on certain conditions (in this case if a char doesn't match its predecessor), but I do not want to remove the char it matches (like str::split does).
Ascii only (in fact, only digits), so no fancy unicode :slight_smile:
If possible without any allocation in between.

I'd like to have a function signature like this:

fn extract_consecutive(s: &str) -> impl Iterator<Item = &str> + '_;
#[test]
fn split_without_remove() {
    assert_eq!(extract_consecutive("12233444").collect::<Vec<_>>(), vec!["1", "22", "33", "444"]);
}

Does anyone has a good idea?

Just use find and split_at!

split_at takes a usize at which point it splits and returns a tuple of &str, but I want multiple points to split at.

You can use it in a loop, then.

However, I don't think this problem particularly fits the notion of "split on a specific character", I'd try to reduce it to something else. Or even just implement it manually (playground).

2 Likes

@H2CO3 nice one. I always forget that using a struct and implementing Iterator for it might be easier, than using a custom iterator with e.g. std::iter::from_fn. Thanks for that!
In particular I like the mem::replace which I haven't had and used a temp variable instead, which is very ugly.

Just two nitpicks and formatting:

impl<'a> Iterator for ExtractConsecutive<'a> {
    type Item = &'a str;

    fn next(&mut self) -> Option<&'a str> {
        let fst = self.string.chars().next()?;

        for (i, ch) in self.string.char_indices() {
            if ch != fst {
                let (head, tail) = self.string.split_at(i);
                self.string = tail;
                return Some(head);
            }
        }

        Some(mem::replace(&mut self.string, ""))
    }
}
2 Likes

oh, that is right, ? works here as well as for … in. Good catch.

For an alternate implementation which uses iterator methods instead of loops:

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.