Regex named capture group canonical usage?

mzagrabe · December 27, 2021, 7:04pm

Greetings,

I am attempting to find the canonical usage of named capture groups for a regex.

I have:

    let content = "\
This is code...
# This is a comment!
";

    lazy_static! {
        static ref COMMENT_PATTERN: Regex = Regex::new(
            r##"(?x)
                ^
                \s*
                \#
                \s*
                (?P<comment>\S.*?)
                \s*
                $
            "##
        ).unwrap();
    }   

    for line in content.lines() {
        if let Some(comment) = COMMENT_PATTERN.captures(line) {
            let comment = Some(comment.name("comment").unwrap().as_str()).unwrap().to_owned();
            println!("Comment: >>{}<<", comment);
        }
    }

which yield...

Comment: >>This is a comment!<<

but the multiple unwrap and the multiple Some make me think I'm doing something not-canonical.

How do other folks use named capture groups with regexes?

Thanks for any help or hints!

H2CO3 · December 27, 2021, 7:10pm

I don't understand the problem – you have put in the superfluous Some, just to immediately unwrap() it, and needlessly convert it to an owned String using to_string(). Can't you just… not do that?

for line in content.lines() {
    if let Some(comment) = COMMENT_PATTERN.captures(line) {
        let comment = comment.name("comment").unwrap().as_str();
        println!("Comment: >>{}<<", comment);
    }
}

mzagrabe · December 27, 2021, 7:42pm

Hi @H2CO3 ,

Thanks for the reply. Yes, indeed I put in the extra Some and unwrap. Still learning my way around.

Thanks again for helping me along the way.

-m

dbr · December 28, 2021, 11:41pm

You can also replace the remaining unwrap with an expect like so:

for line in content.lines() {
    if let Some(comment) = COMMENT_PATTERN.captures(line) {
        let comment = comment.name("comment").expect("COMMENT_PATTERN missing 'comment' capture group").as_str();
        println!("Comment: >>{}<<", comment);
    }
}

In this case there's no sensible way to recover if the statically defined regex is missing the comment group, so an unwrap is perfectly valid error handling - just instead of unwrap you can use expect with a nice error string explaining the problem. It'll still work the same way as an unwrap, just makes it clearer this is intentional (and not just "I haven't set up proper error handling yet")

Michael-F-Bryan · December 29, 2021, 8:06am

If we are going to panic on a missing capture group anyway, why not just use the [] operator?

for line in content.lines() {
    if let Some(captures) = pattern.captures(line) {
        let comment = &captures["comment"];
        println!("Comment: >>{}<<", comment);
    }
}

(playground)

mzagrabe · December 29, 2021, 7:55pm

Well, that is pretty slick. Pretty much exactly what I was hoping for.

Thanks @Michael-F-Bryan for the hint!

I've looked (a bit) for the documentation for the operator, but couldn't find it. Any pointers?

mzagrabe · December 29, 2021, 7:56pm

[] operator

Michael-F-Bryan · December 29, 2021, 8:03pm

The API docs for the regex crate are automatically published to docs.rs. When you open the page for regex::Captures, scroll down to the "trait implementations" section and you'll see that it implements the Index trait for both usize and &str, letting you index either by the group number (&captures[4]) or name (&captures["name"]).

The Panics section also mentions that indexing with something that doesn't exist will panic. This is actually a good thing, because this sort of thing should never fail (you were the one to write both the regex and the indexing code, after all), and panicking will cause your program to blow up loudly so you know to fix it.

mzagrabe · December 29, 2021, 8:26pm

Great work @Michael-F-Bryan . Thanks again for helping out!

kaj · December 30, 2021, 12:55am

There is also a crate, lazy_regex, that checks regex syntax at compile-time and allows checked group assignment with the regex_captures macro, which I find very cool and usefull get things correct as soon as they compiles.

system · March 30, 2022, 12:56am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Iterate on captures, either names or number? help	5	1941	February 20, 2021
Matching a regex at a particular index? help	7	1013	March 18, 2023
Problem with regex capture?. What am I missing? help	5	4797	September 5, 2020
Regular expression and captures in an if condition help	4	694	March 29, 2023
Dealing with trying to grab the value of a regex capture (noob help) help	2	521	February 26, 2023

Regex named capture group canonical usage?

Related topics