Hi All,
I have a code where I have a collection of Strings (a
) and I want to exclude the Strings not part of the search collection (excl_strings
). I have implemented a logic below and would like to know if there is any idiomatic way to achieve this?
fn main() {
let a = vec![String::from("A11"),
String::from("A12"),
String::from("A13"),
String::from("A21"),
String::from("A22"),
String::from("A23"),
String::from("A24"),
String::from("B31"),
String::from("B32"),];
let excl_strings = vec![String::from("C2"), String::from("A13"), String::from("A2")];
let result: Vec<&String> = a.iter().filter(|s| {
let mut exclude = false;
for search_string in &excl_strings {
if s.starts_with(search_string) {
exclude = true;
break;
}
else {
exclude = false;
};
}
!exclude
}).collect();
println!("{:?}", result);
}
I would be happy if there is any crate already does this
bjorn3
June 10, 2020, 1:29pm
2
If you can use regexes to represent the search collection, then regex::RegexSet
may help.
2 Likes
bjorn3
June 10, 2020, 1:33pm
3
The other option is to use fst
. I think that would be something like:
use fst::automaton::{Automaton, Subsequence};
use fst::{IntoStreamer, Streamer, Set};
fn example() -> Result<(), Box<dyn std::error::Error>> {
let set = Set::from_iter(&[
"a foo bar", "foo", "foo1", "foo2", "foo3", "foobar",
]).unwrap();
let matcher = Subsequence::new("for").starts_with();
let mut stream = set.search(&matcher).into_stream();
let mut keys = vec![];
while let Some(key) = stream.next() {
keys.push(String::from_utf8(key.to_vec())?);
}
assert_eq!(keys, vec![
"a foo bar", "foobar",
]);
Ok(())
}
(adapted from the docs for Set::search
)
1 Like
trentj
June 10, 2020, 1:41pm
4
Here's what came to mind for me.
let result: Vec<_> = a
.iter()
.filter(|s| !excl_strings.iter().any(|n| s.starts_with(n)))
.collect();
You can pass the same closure to Vec::retain
if you want to permanently remove those strings from a
.
3 Likes
If:
You only have a couple strings you want to exclude
Every string you're searching is small
Performance isn't a big concern
then I'd go with @trentj 's solution.
Otherwise, the "right" way to do this is with Aho-Corasick:
use aho_corasick::{AhoCorasickBuilder, MatchKind};
fn main() {
let a = vec![
String::from("A11"),
String::from("A12"),
String::from("A13"),
String::from("A21"),
String::from("A22"),
String::from("A23"),
String::from("A24"),
String::from("B31"),
String::from("B32"),
];
let excl_strings = vec![String::from("C2"), String::from("A13"), String::from("A2")];
let excluder = AhoCorasickBuilder::new()
.anchored(true)
.match_kind(MatchKind::LeftmostFirst)
.auto_configure(&excl_strings)
.build(&excl_strings);
let result: Vec<&String> = a.iter().filter(|s| !excluder.is_match(s)).collect();
println!("{:?}", result);
}
N.B. For a small number of strings, this won't even use Aho-Corasick. It will instead use a very fast vectorized algorithm.
If you don't mind pulling in regex
, then you could also just do it that way (there's no need for a RegexSet
here):
use regex::Regex;
fn main() {
let a = vec![
String::from("A11"),
String::from("A12"),
String::from("A13"),
String::from("A21"),
String::from("A22"),
String::from("A23"),
String::from("A24"),
String::from("B31"),
String::from("B32"),
];
let excluder = Regex::new(r"^(C2|A13|A2)").unwrap();
let result: Vec<&String> = a.iter().filter(|s| !excluder.is_match(s)).collect();
println!("{:?}", result);
}
8 Likes
This response looks precise and elegant. Thanks a lot
I also like your regex example. I think I will have a use of it when I do complex pattern matching. Thanks for your suggestions
system
Closed
September 13, 2020, 6:49am
8
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.