Adapt function with iterators

I want to refactor this function that I wrote in a more idiomatic way, but I'm having some difficulty since iterators are a fairly new concept for me.

// take a string and return in a vector of strings
// all the parts inside square brackets
fn parser(msg: &str) -> Vec<String> {
    let mut v: Vec<String> = vec![];
    let mut buffer = String::new();
    let mut brackets = false;

    for c in msg.chars() {
        if c == '[' {
            brackets = true;
            continue;
        }

        if c == ']' {
            brackets = false;
            v.push(buffer.clone());
            buffer.clear();
        }

        if brackets {
            buffer.push(c);
        }
    }

    v
}


fn main() {
    let space_msg = "IF[14E]LG[5O]D[20O]HELLO[1Q]";
    println!("{space_msg}");

    let parsed_space_msg = parser(space_msg);
    println!("{parsed_space_msg:?}");
}

as you can see, I've just wrote C code, how can I refactor it with iterators? Thanks

Assuming the brackets are always balanced and never nested:

fn parser(msg: &str) -> Vec<String> {
    msg
        .split(&['[', ']'])
        .skip(1)
        .step_by(2)
        .map(|s| s.to_string())
        .collect()
}

You can also avoid allocating new Strings and instead borrow from the original one:

fn parser(msg: &str) -> Vec<&str> {
    msg
        .split(&['[', ']'])
        .skip(1)
        .step_by(2)
        .collect()
}
4 Likes

Here is another possible solution:

fn parser(msg: &str) -> Vec<String> {
    let mut result = vec![];
    
    let mut iter = msg.chars().peekable();
    let iter = iter.by_ref();
    
    while iter.peek().is_some() {
        iter.take_while(|c| *c != '[').for_each(drop);
        let s = iter.take_while(|c| *c != ']').collect();
        result.push(s);
    }

    result
}
1 Like

Another version, but not as readable I think

use itertools::Itertools;
fn parser(msg: &str) -> impl Iterator<Item=String> + '_ {
    msg.chars().map(String::from).coalesce(|accum, curr| {
        if curr == "]" {
            // return collected string, start with new empty accumulator
            Err((accum, String::new()))
        } else if curr == "[" {
            // trash the accumulator
            Ok(String::new())
        } else {
            // append the current char to the accumulator
            Ok(accum + &curr)
        }
    }).filter(|s| !s.is_empty())
}
1 Like

Might be faster to use the actual str::find method than going char-by-char with an iterator.

// take a string and return in a vector of strings
// all the parts inside square brackets
fn parser(msg: &str) -> Vec<&str> {
    let mut msg = &msg[..];
    let mut v = vec![];
    
    while !msg.is_empty() {
        match msg.find('[') {
            None => break,
            Some(start) => {
                let tail = &msg[start+1..];
                match tail.find(']') {
                    None => panic!("oh no, unbalanced braces!"),
                    Some(end) => {
                        v.push(&tail[..end]);
                        msg = &tail[end+1..];
                    },
                }
            },
        }
    }
    
    v
}

fn main() {
    let space_msg = "IF[14E]LG[5O]D[20O]HELLO[1Q]";
    println!("{space_msg}");

    let parsed_space_msg = parser(space_msg);
    println!("{parsed_space_msg:?}");
}

Edit: by the way, almost every parser I write in Rust uses a simple character loops, just because they're so much easier for me to reason about. Don't feel too bad. :slight_smile:

3 Likes

Let’s see about changes without changing the logic…

There’s two things that could be changed in terms of ownership in this code, on one hand the Strings being returned could be &str slices into the original string, as @SkiFire13 pointed out. The other thing, where iterators can play a role, is that the return type could also be a impl Iterator<Item = …> type instead of a Vec, allowing the user to choose whether to .collect() into a Vec or to just iterator over the result once and consume it directly.

A third thing that could be done is to use functionality such as .find(…) from str instead of looping over the values. The previous answer demonstrated .split(…), which is similar higher-level functionality. Also, this kind of code is probably straightforward to implement using Regexes.

Let’s see how small changes could be made to the original code in order to do things described above.

E.g. in order to return &str slices into the original string, you’d need to record the range of indices instead of collecting a buffer, and then return a &msg[start…end] view:

fn parser(msg: &str) -> Vec<&str> {
    let mut v: Vec<&str> = vec![];
    let mut start = None;

    for (i, c) in msg.char_indices() {
        if c == '[' {
            if start.is_none() {
                start = Some(i + 1);
            }
            continue;
        }

        if c == ']' {
            v.push(start.map(|s| &msg[s..i]).unwrap_or_default());
            start = None;
        }
    }

    v
}

Here char_indices is used instead of char, so we know where the characters come from and can thus record our boundaries and create the string accordingly. The variables buffer and brackets are replaced by a start: Option<usize> which is None when brackets was false, and Some(i) where i is the index where the chars in buffer started, when brackets is true.

This code preserves the behavior if ]s is encountered when no [ was seen. It changes the behavior for multiple [s though, as previously, the parser would have returned a String consisting of anything but the [ symbols themself, as in something like parser("this [message[has[many[brackets]") returning ["messagehasmanybrackets"] which is not a slice of the original string, so the above will return ["message[has[many[brackets"] instead.

Of course, even more ways of handling mismatching or nested brackets could be discussed, be it by considering proper nesting, or reporting errors.


If you want to return an impl Iterator, then std::iter::from_fn is a useful tool to re-structure the code. Let me first suggest a minor change to the original code, where std::mem::take(&mut buffer) seems nicer to me than the operation of cloning and clearing the buffer for the case

        if c == ']' {
            brackets = false;
            v.push(buffer.clone());
            buffer.clear();
        }

becoming

        if c == ']' {
            brackets = false;
            v.push(std::mem::take(&mut buffer));
        }

Now here is how to return impl Iterator instead (with minimal change):

fn parser(msg: &str) -> impl Iterator<Item = String> + '_ {
    let mut buffer = String::new();
    let mut brackets = false;
    let mut chars = msg.chars();

    std::iter::from_fn(move || {
        for c in chars.by_ref() {
            if c == '[' {
                brackets = true;
                continue;
            }

            if c == ']' {
                brackets = false;
                return Some(std::mem::take(&mut buffer));
            }

            if brackets {
                buffer.push(c);
            }
        }
        None
    })
}

The use of for c in &mut chars iterates the iterator char = msg.chars() by mutable-reference access, which has the effect that the iteration can be aborted and later resumed as the for loop didn’t consume the iterator yet. It’s aborted for every completed parsed string, via an early return of Some(mem::take(&mut buffer)); one it does complete, out iterator returns None indicating the end of the original Vec.

Of course, both approaches could be combined:

fn parser(msg: &str) -> impl Iterator<Item = &str> {
    let mut start = None;
    let mut char_indices = msg.char_indices();

    std::iter::from_fn(move || {
        for (i, c) in char_indices.by_ref() {
            if c == '[' {
                if start.is_none() {
                    start = Some(i + 1);
                }
                continue;
            }

            if c == ']' {
                let start = start.take(); // sets to `None`
                return Some(start.map(|s| &msg[s..i]).unwrap_or_default());
            }
        }
        None
    })
}

By the way, if you want to display a thing Vec-style without converting into a Vec, the .format(…) method from the itertools crate is quite useful, e.g.

use itertools::Itertools;

fn main() {
    let space_msg = "IF[14E]LG[5O]D[20O]HELLO[1Q]";
    println!("{space_msg}");

    let parsed_space_msg = parser(space_msg);
    println!("[{:?}]", parsed_space_msg.format(", "));
}

Finally, as another demonstration of find, though of course it significantly changes the way mismatching or nested brackets are treated, here’s a from_fn-version of the code using find that @DanielKeep provided above:

fn parser(mut msg: &str) -> impl Iterator<Item = &str> {
    std::iter::from_fn(move || {
        let start = msg.find('[')?;
        let tail = &msg[start + 1..];
        let end = tail.find(']').expect("oh no, unbalanced braces!");
        msg = &tail[end + 1..];
        Some(&tail[..end])
    })
}
5 Likes