RegexSet vs. Vec<Regex>

Hi dear Rustceans,

I'm reading a file line by line and testing each line for a match against a list of regexes, and if a match is found, interested in the capture groups. Reading is stopped after the first match, to simplify.

I could use either a RegexSet or a Vec<Regex>.

So,

  • is it faster to use a RegexSet and if match, get capture groups by calling captures() a the regex which triggers the match
  • or call captures() on each element of the vector and test if not None ?

I suppose it depends on the number of regexes to test, and also of the probability of a match. If a match is unlikely, maybe it's faster to call RegexSet::matches() and then if a match is found, get the captures in a second step ?

Do you have an advice on that ?

Thanks for any hint.

1 Like

I'm the author of regex and the best and only advice you're going to get to this question is to try both and measure. There are almost certainly cases where one is better than the other, but they are going to be highly dependent on both what you're searching and your regexes.

4 Likes

@BurntSushi Thanks for your hint.

There're indeed too many parameters to infer a definitive conclusion. That's why I opted in to give the possibility to chose at run time.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.