How to use r#"" in RegEx

I've the below RegEx that works fine:

use regex::Regex;

    println!("command {}", command);
    let re = Regex::new(r#"(?i)(What is|What's|Calculate|How much is)\s(\d)\s(\+|and|plus|\-|less|minus|x|by|multiplied by|/|over|divided by)\s(\d)"#).unwrap();
    let matching = re.is_match(command);
    
    println!("re: {:#?}\n Command: {}\n matching: {}", re, command, matching);

    for caps in re.captures(command) {
        println!("groups: {} {} {} {}", 
            caps.get(1).unwrap().as_str(),
            caps.get(2).unwrap().as_str(),
            caps.get(3).unwrap().as_str(),
            caps.get(4).unwrap().as_str()
        );
    }

And I got the required output as:

command What is 1 / 2
re: (?i)(What is|What's|Calculate|How much is)\s(\d)\s(\+|and|plus|\-|less|minus|x|by|multiplied by|/|over|divided by)\s(\d)
 Command: What is 1 / 2
 matching: true
 groups: What is 1 / 2

When I tried using the raw strings literals r#"" as:

    let re = Regex::new(r#"
(?i)(What is|What's|Calculate|How much is)
\s(\d)
\s(\+|and|plus|\-|less|minus|x|by|multiplied by|/|over|divided by)
\s(\d)
"#).unwrap();

It did not work, and gave me:

 Command: What is 1 / 2
 matching: false

I changed End of Line Sequence but nothing worked:
image

I tried even retain and filter but apparantly none of them working with regex

Which looks that either space or \n had been added, any idea to solve it?

The whole point of a raw string literal (r#"" as opposed to just "") is to preserve all characters in the literal, including whitespace, so this is doing exactly what it should be doing.

I think you're asking for a string literal that lacks escape sequences so you don't need to escape backslashes, but also ignores linebreaks, and AFAIK that's just not a thing that exists (unless someone wrote a crate for it). You're probably better off breaking your regex into multiple string literals and concatenating them.

1 Like

Your first regex is already using raw string literal (r#"…"#) and as you say, it works.

Perhaps what you want is to enable "insignificant whitespace mode" in the regex library (r#"(?x)…"). Note that spaces in eg. What is won't work, and you'd need to escape them with \x20.

Alternatively, you can call Regex::new(r#"…".replace('\n', "") to quickly get rid of just newlines. This sounds like a very hacky solution, though.

4 Likes

I tried below but did not work:

    let mut re_str = String::from(r#"
                (?i)(What is|What's|Calculate|How much is)
                \s(\d)
                \s(\+|and|plus|\-|less|minus|x|by|multiplied by|/|over|divided by)
                \s(\d)
            "#); //
    re_str.retain(|c| !c.is_whitespace());

    let re = Regex::new(re_str.as_str()).unwrap();
    let matching = re.is_match(command);
    
    println!("re: {:#?}\n Command: {}\n matching: {}", re, command, matching);

ALso this did not work:

    println!("command {}", command);
    let re_str: String = String::from(r#"
                (?i)(What is|What's|Calculate|How much is)
                \s(\d)
                \s(\+|and|plus|\-|less|minus|x|by|multiplied by|/|over|divided by)
                \s(\d)
            "#).chars().filter(|c| !c.is_whitespace()).collect();

    let re = Regex::new(re_str.as_str()).unwrap();
    let matching = re.is_match(command);
    
    println!("re: {:#?}\n Command: {}\n matching: {}", re, command, matching);

@hyousef You need to use x, as suggested above. This works for me:

use regex::Regex;

fn main() {
    let command = "what is 1 + 2";
    let re = Regex::new(r#"(?xi)
        (What\ is|What's|Calculate|How\ much\ is)
        \s(\d)
        \s(\+|and|plus|\-|less|minus|x|by|multiplied\ by|/|over|divided\ by)
        \s(\d)
    "#).unwrap();
    let matching = re.is_match(command);

    println!("re: {:#?}\n Command: {}\n matching: {}", re, command, matching);

    for caps in re.captures(command) {
        println!("groups: {} {} {} {}",
            caps.get(1).unwrap().as_str(),
            caps.get(2).unwrap().as_str(),
            caps.get(3).unwrap().as_str(),
            caps.get(4).unwrap().as_str()
        );
    }
}

Note that you must escape whitespace when in x mode that you want to be significant. Notice that I wrote What\ is instead of What is.

4 Likes

Thanks, or is fine now, but have slight clarification
What is the different between escaping space or escaping s, i.e. '\ ' and '\s'

'\ ' greps for the UTF-8 code-point 0x20, while '\s'greps for any character of the class :space:, including tabs, form-feeds, non-ASCII spaces in other languages, etc.

3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.