Ownership in nested `match` statements

I have created the function below which in brief takes the result (indices) from a fuzzy matcher, the string which was sent to the matcher, and then creates chunks of either matched or non-matched string fragments. See the unit test at the bottom.

While doing this, I had a tough time with the compiler and re-created and cloned some items more than I thought is appropriate. I have added a few comments below and would appreciate feedback on how to improve here.

Thank you very much!


pub enum Chunk {
    Match(String),
    NoMatch(String),
}

pub fn aggregate_chunks(string: &str, indices: &[usize]) -> Vec<Chunk> {
    let mut chunks = Vec::new();

    // this is mostly an option because I don't know whether the
    // first character will be matched or not.
    let mut current_chunk: Option<Chunk> = None;

    for (i, c) in string.char_indices() {
        let contained = indices.contains(&i);

        match current_chunk {
            None => {
                let mut s = String::new();
                s.push(c);
                current_chunk = match contained {
                    true => Some(Chunk::Match(s)),
                    false => Some(Chunk::NoMatch(s)),
                }
            }
            // here, I have `ref mut m` to be able to mutate the string
            Some(ref mut m) => match (m, contained) {
                // is there a way to condense these four arms to two, given that there is some
                // duplication the arms?
                (Chunk::Match(ref mut s), true) => {
                    s.push(c);
                }
                (Chunk::Match(s), false) => {
                    // initially I wante to push `m` or at least `m.clone()`, but `m` was moved
                    // in the `match (m, contained)` statement above and I have no idea how to
                    // annotate this differently  
                    chunks.push(Chunk::Match(s.clone()));
                    let mut s = String::new();
                    s.push(c);
                    current_chunk = Some(Chunk::NoMatch(s));
                }
                (Chunk::NoMatch(s), true) => {
                    chunks.push(Chunk::NoMatch(s.clone()));
                    let mut s = String::new();
                    s.push(c);
                    current_chunk = Some(Chunk::Match(s));
                }
                (Chunk::NoMatch(ref mut s), false) => {
                    s.push(c);
                }
            },
        }
    }

    if let Some(c) = current_chunk {
        chunks.push(c);
    }

    chunks
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_aggregate() {
        let string = "Hallo Welt";
        let indices = vec![0, 1, 2, 3, 8, 9];

        let chunks = aggregate_chunks(string, &indices);

        let expected_result = vec![
            Chunk::Match(String::from("Hall")),
            Chunk::NoMatch(String::from("o We")),
            Chunk::Match(String::from("lt")),
        ];

        assert_eq!(chunks, expected_result);
    }
}

The simplest, most naïve way seems to work.

I can't understand your problem. the test case seems clear, but the implementation is complicated. what's the function supposed to do, can you give more test cases? maybe there's a better algorithm.

as I can't figure out your algorithm, I can't say for sure, but I would guess you are looking for the | operator which can give different patterns with same bindings. example:

let result: Result<String, String> = todo!();
match &result {
    Ok(ref msg) | Err(ref msg) => println!("{msg}"),
}
// short hand using `let` syntax
let Ok(ref msg) | (Err(ref msg) = &result;
println!("{msg}");

although I don't understand your code, it seems you are looking for Option::as_ref() and Option::as_mut() maybe?

1 Like

How do you fancy something like this to enhance your function?

pub fn aggregate_chunks(string: &str, indices: &[usize]) -> Vec<Chunk> {
    let mut chunks = Vec::new();

    let mut current_chunk: Option<Chunk> = None;

    for (i, c) in string.char_indices() {
        let contained = indices.contains(&i);

        current_chunk = match current_chunk {
            None => match contained {
                true => Some(Chunk::Match(String::from(c))),
                false => Some(Chunk::NoMatch(String::from(c))),
            },

            Some(mut m) => match (&mut m, contained) {
                (Chunk::Match(ref mut s), true) | (Chunk::NoMatch(ref mut s), false) => {
                    s.push(c);
                    Some(m)
                }
                (Chunk::Match(_), false) => {
                    chunks.push(m);
                    Some(Chunk::NoMatch(String::from(c)))
                }
                (Chunk::NoMatch(_), true) => {
                    chunks.push(m);
                    Some(Chunk::Match(String::from(c)))
                }
            },
        };
    }

    if let Some(c) = current_chunk {
        chunks.push(c);
    }

    chunks
}

Playground.


Here a way where current_chunk is not optional (removing the outer of the two nested match statements):

pub fn aggregate_chunks(string: &str, indices: &[usize]) -> Vec<Chunk> {
    assert!(indices.len() > 0);
    assert!(string.chars().count() > 0);

    let mut chunks = Vec::new();

    let mut current_chunk = match indices.contains(&0) {
        true => Chunk::Match(String::from(string.chars().nth(0).unwrap())),
        false => Chunk::NoMatch(String::from(string.chars().nth(0).unwrap())),
    };

    for (i, c) in string.char_indices().skip(1) {
        let contained = indices.contains(&i);

        current_chunk = match (&mut current_chunk, contained) {
            (Chunk::Match(ref mut s), true) | (Chunk::NoMatch(ref mut s), false) => {
                s.push(c);
                current_chunk
            }
            (Chunk::Match(_), false) => {
                chunks.push(current_chunk);
                Chunk::NoMatch(String::from(c))
            }
            (Chunk::NoMatch(_), true) => {
                chunks.push(current_chunk);
                Chunk::Match(String::from(c))
            }
        };
    }

    chunks.push(current_chunk);

    chunks
}

Playground.


And just as food for thought, here something where we compute the chunks independently from the string. I assume this to be more performant as it involves less iterations (as long as indices is smaller than the amount of characters in string) and doesn't require as many string allocations: playground.

3 Likes

Thank you @nerditation and @jofas!

@jofas: Thank you for providing the comprehensive correction, it exposes the required annotations for m and tells how to directly push a Chunk which is considered to be completed. Your alternative version with the Range is also highly educative!

@nerditation Thank you for going through the code and providing valuable input for me in form of the |operator for match and the Option::as_ref() hint which I will look up in the docs.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.