I am attempting to find the canonical usage of named capture groups for a regex.
I have:
let content = "\
This is code...
# This is a comment!
";
lazy_static! {
static ref COMMENT_PATTERN: Regex = Regex::new(
r##"(?x)
^
\s*
\#
\s*
(?P<comment>\S.*?)
\s*
$
"##
).unwrap();
}
for line in content.lines() {
if let Some(comment) = COMMENT_PATTERN.captures(line) {
let comment = Some(comment.name("comment").unwrap().as_str()).unwrap().to_owned();
println!("Comment: >>{}<<", comment);
}
}
which yield...
Comment: >>This is a comment!<<
but the multiple unwrap and the multiple Some make me think I'm doing something not-canonical.
How do other folks use named capture groups with regexes?
I don't understand the problem – you have put in the superfluous Some, just to immediately unwrap() it, and needlessly convert it to an owned String using to_string(). Can't you just… not do that?
for line in content.lines() {
if let Some(comment) = COMMENT_PATTERN.captures(line) {
let comment = comment.name("comment").unwrap().as_str();
println!("Comment: >>{}<<", comment);
}
}
You can also replace the remaining unwrap with an expect like so:
for line in content.lines() {
if let Some(comment) = COMMENT_PATTERN.captures(line) {
let comment = comment.name("comment").expect("COMMENT_PATTERN missing 'comment' capture group").as_str();
println!("Comment: >>{}<<", comment);
}
}
In this case there's no sensible way to recover if the statically defined regex is missing the comment group, so an unwrap is perfectly valid error handling - just instead of unwrap you can use expect with a nice error string explaining the problem. It'll still work the same way as an unwrap, just makes it clearer this is intentional (and not just "I haven't set up proper error handling yet")
If we are going to panic on a missing capture group anyway, why not just use the [] operator?
for line in content.lines() {
if let Some(captures) = pattern.captures(line) {
let comment = &captures["comment"];
println!("Comment: >>{}<<", comment);
}
}
The API docs for the regex crate are automatically published to docs.rs. When you open the page for regex::Captures, scroll down to the "trait implementations" section and you'll see that it implements the Index trait for both usize and &str, letting you index either by the group number (&captures[4]) or name (&captures["name"]).
The Panics section also mentions that indexing with something that doesn't exist will panic. This is actually a good thing, because this sort of thing should never fail (you were the one to write both the regex and the indexing code, after all), and panicking will cause your program to blow up loudly so you know to fix it.
There is also a crate, lazy_regex, that checks regex syntax at compile-time and allows checked group assignment with the regex_captures macro, which I find very cool and usefull get things correct as soon as they compiles.