String Starts with in any one in the Vec?

Hi All,

I have a code where I have a collection of Strings (a) and I want to exclude the Strings not part of the search collection (excl_strings). I have implemented a logic below and would like to know if there is any idiomatic way to achieve this?

fn main() {
    let a = vec![String::from("A11"), 
                 String::from("A12"),
                 String::from("A13"),
                 String::from("A21"), 
                 String::from("A22"),
                 String::from("A23"),
                 String::from("A24"), 
                 String::from("B31"),
                 String::from("B32"),];
                 
    let excl_strings = vec![String::from("C2"), String::from("A13"), String::from("A2")];
    
    let result: Vec<&String> = a.iter().filter(|s| {
        let mut exclude = false;
        for search_string in &excl_strings {
            if s.starts_with(search_string) {
                exclude = true;
                break;
            }
            else {
                exclude = false;
            };
        }
        !exclude
    }).collect();
    
    println!("{:?}", result);
}

I would be happy if there is any crate already does this :slight_smile:

If you can use regexes to represent the search collection, then regex::RegexSet may help.

2 Likes

The other option is to use fst. I think that would be something like:

use fst::automaton::{Automaton, Subsequence};
use fst::{IntoStreamer, Streamer, Set};

fn example() -> Result<(), Box<dyn std::error::Error>> {
    let set = Set::from_iter(&[
        "a foo bar", "foo", "foo1", "foo2", "foo3", "foobar",
    ]).unwrap();

    let matcher = Subsequence::new("for").starts_with();
    let mut stream = set.search(&matcher).into_stream();

    let mut keys = vec![];
    while let Some(key) = stream.next() {
        keys.push(String::from_utf8(key.to_vec())?);
    }
    assert_eq!(keys, vec![
        "a foo bar", "foobar",
    ]);

    Ok(())
}

(adapted from the docs for Set::search)

1 Like

Here's what came to mind for me.

    let result: Vec<_> = a
        .iter()
        .filter(|s| !excl_strings.iter().any(|n| s.starts_with(n)))
        .collect();

You can pass the same closure to Vec::retain if you want to permanently remove those strings from a.

3 Likes

If:

  • You only have a couple strings you want to exclude
  • Every string you're searching is small
  • Performance isn't a big concern

then I'd go with @trentj's solution.

Otherwise, the "right" way to do this is with Aho-Corasick:

use aho_corasick::{AhoCorasickBuilder, MatchKind};

fn main() {
    let a = vec![
        String::from("A11"),
        String::from("A12"),
        String::from("A13"),
        String::from("A21"),
        String::from("A22"),
        String::from("A23"),
        String::from("A24"),
        String::from("B31"),
        String::from("B32"),
    ];

    let excl_strings = vec![String::from("C2"), String::from("A13"), String::from("A2")];
    let excluder = AhoCorasickBuilder::new()
        .anchored(true)
        .match_kind(MatchKind::LeftmostFirst)
        .auto_configure(&excl_strings)
        .build(&excl_strings);

    let result: Vec<&String> = a.iter().filter(|s| !excluder.is_match(s)).collect();

    println!("{:?}", result);
}

N.B. For a small number of strings, this won't even use Aho-Corasick. It will instead use a very fast vectorized algorithm.

If you don't mind pulling in regex, then you could also just do it that way (there's no need for a RegexSet here):

use regex::Regex;

fn main() {
    let a = vec![
        String::from("A11"),
        String::from("A12"),
        String::from("A13"),
        String::from("A21"),
        String::from("A22"),
        String::from("A23"),
        String::from("A24"),
        String::from("B31"),
        String::from("B32"),
    ];

    let excluder = Regex::new(r"^(C2|A13|A2)").unwrap();
    let result: Vec<&String> = a.iter().filter(|s| !excluder.is_match(s)).collect();

    println!("{:?}", result);
}
8 Likes

This response looks precise and elegant. Thanks a lot :slight_smile:

I also like your regex example. I think I will have a use of it when I do complex pattern matching. Thanks for your suggestions

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.