Generally the best way to get solve a problem like this, whether on your own or when asking on a forum, is to try and reduce the problem down to a minimal example that demonstrates the problem. Take your big complicated regex, and keep trimming parts down until the problem goes away; then add that part back, and trim down the rest.
It is a little bit unclear from your question if the problem you are having is with compiling the regex (Regex::new("...").unwrap()
), or with matching the text. If the problem is with matching, then also try to cut down the input to a small example that you think should match, but which doesn't.
This may be enough for you to solve the problem on your own. If not, it will give you something small enough that you can ask about it here.
For example, after this process, you should wind up with an example something like the following:
extern crate regex;
use regex::Regex;
fn main() {
let re = Regex::new("(ab*)|(ba*)|(ca*)|(da*)").unwrap();
for cap in re.captures_iter("abracadabra") {
println!("Captures: {} {} {} {}", &cap[1], &cap[2], &cap[3], &cap[4]);
}
}
Then your question might be something like:
I ran the above code, and got the following error:
thread 'main' panicked at 'no group at index '2'', /root/.cargo/registry/src/github.com-1ecc6299db9ec823/regex-1.0.4/src/re_unicode.rs:1019:32
To which the answer would be that only one of those capture groups actually matched anything in the string, so you have to check which capture groups actually contain a match with cap.get()
or cap.iter()
instead of just indexing with cap[i]
.
As a note on your question about regex101.com, each regular expression engine has slightly different syntax and behavior. That site does not support the Rust regular expression engine. The one on that site that is probably the closest is the golang one, since both the golang and Rust regular expression engines and syntax derived some inspiration from RE2, and both use finite automaton based engines rather than backtracking engines (which limit the features, such as not supporting back-references, but improve the worst-time performance).
@krdln's suggestion, of using a RegexSet
, is a much cleaner way of matching multiple regular expressions simultaneously than combining them yourself. It is essentially equivalent to combing then yourself with capture groups, but you don't have to go to the extra effort of adding the extra syntax for capture groups.
And one final note, is that if you are having trouble with the syntax and your regex contains any backslashes, check to see if you're using raw strings; otherwise, you'll have to double up all of the backslashes. r"foo\.bar"
can be easier to read and verify than "foo\\.bar"
.