Regex named capture group canonical usage?

Greetings,

I am attempting to find the canonical usage of named capture groups for a regex.

I have:

    let content = "\
This is code...
# This is a comment!
";

    lazy_static! {
        static ref COMMENT_PATTERN: Regex = Regex::new(
            r##"(?x)
                ^
                \s*
                \#
                \s*
                (?P<comment>\S.*?)
                \s*
                $
            "##
        ).unwrap();
    }   

    for line in content.lines() {
        if let Some(comment) = COMMENT_PATTERN.captures(line) {
            let comment = Some(comment.name("comment").unwrap().as_str()).unwrap().to_owned();
            println!("Comment: >>{}<<", comment);
        }
    }   

which yield...

Comment: >>This is a comment!<<

but the multiple unwrap and the multiple Some make me think I'm doing something not-canonical.

How do other folks use named capture groups with regexes?

Thanks for any help or hints!

I don't understand the problem – you have put in the superfluous Some, just to immediately unwrap() it, and needlessly convert it to an owned String using to_string(). Can't you just… not do that?

for line in content.lines() {
    if let Some(comment) = COMMENT_PATTERN.captures(line) {
        let comment = comment.name("comment").unwrap().as_str();
        println!("Comment: >>{}<<", comment);
    }
}
1 Like

Hi @H2CO3 ,

Thanks for the reply. Yes, indeed I put in the extra Some and unwrap. Still learning my way around.

Thanks again for helping me along the way.

-m

You can also replace the remaining unwrap with an expect like so:

for line in content.lines() {
    if let Some(comment) = COMMENT_PATTERN.captures(line) {
        let comment = comment.name("comment").expect("COMMENT_PATTERN missing 'comment' capture group").as_str();
        println!("Comment: >>{}<<", comment);
    }
}

In this case there's no sensible way to recover if the statically defined regex is missing the comment group, so an unwrap is perfectly valid error handling - just instead of unwrap you can use expect with a nice error string explaining the problem. It'll still work the same way as an unwrap, just makes it clearer this is intentional (and not just "I haven't set up proper error handling yet")

If we are going to panic on a missing capture group anyway, why not just use the [] operator?

for line in content.lines() {
    if let Some(captures) = pattern.captures(line) {
        let comment = &captures["comment"];
        println!("Comment: >>{}<<", comment);
    }
}

(playground)

2 Likes

Well, that is pretty slick. Pretty much exactly what I was hoping for.

Thanks @Michael-F-Bryan for the hint!

I've looked (a bit) for the documentation for the operator, but couldn't find it. Any pointers?

[] operator

The API docs for the regex crate are automatically published to docs.rs. When you open the page for regex::Captures, scroll down to the "trait implementations" section and you'll see that it implements the Index trait for both usize and &str, letting you index either by the group number (&captures[4]) or name (&captures["name"]).

The Panics section also mentions that indexing with something that doesn't exist will panic. This is actually a good thing, because this sort of thing should never fail (you were the one to write both the regex and the indexing code, after all), and panicking will cause your program to blow up loudly so you know to fix it.

Great work @Michael-F-Bryan . Thanks again for helping out!

1 Like

There is also a crate, lazy_regex, that checks regex syntax at compile-time and allows checked group assignment with the regex_captures macro, which I find very cool and usefull get things correct as soon as they compiles.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.