How to use r#"" in RegEx

hyousef · May 3, 2020, 12:19pm

I've the below RegEx that works fine:

use regex::Regex;

    println!("command {}", command);
    let re = Regex::new(r#"(?i)(What is|What's|Calculate|How much is)\s(\d)\s(\+|and|plus|\-|less|minus|x|by|multiplied by|/|over|divided by)\s(\d)"#).unwrap();
    let matching = re.is_match(command);
    
    println!("re: {:#?}\n Command: {}\n matching: {}", re, command, matching);

    for caps in re.captures(command) {
        println!("groups: {} {} {} {}", 
            caps.get(1).unwrap().as_str(),
            caps.get(2).unwrap().as_str(),
            caps.get(3).unwrap().as_str(),
            caps.get(4).unwrap().as_str()
        );
    }

And I got the required output as:

command What is 1 / 2
re: (?i)(What is|What's|Calculate|How much is)\s(\d)\s(\+|and|plus|\-|less|minus|x|by|multiplied by|/|over|divided by)\s(\d)
 Command: What is 1 / 2
 matching: true
 groups: What is 1 / 2

When I tried using the raw strings literals r#"" as:

    let re = Regex::new(r#"
(?i)(What is|What's|Calculate|How much is)
\s(\d)
\s(\+|and|plus|\-|less|minus|x|by|multiplied by|/|over|divided by)
\s(\d)
"#).unwrap();

It did not work, and gave me:

 Command: What is 1 / 2
 matching: false

I changed End of Line Sequence but nothing worked:

I tried even retain and filter but apparantly none of them working with regex

Which looks that either space or \n had been added, any idea to solve it?

Ixrec · May 3, 2020, 12:23pm

The whole point of a raw string literal (r#"" as opposed to just "") is to preserve all characters in the literal, including whitespace, so this is doing exactly what it should be doing.

I think you're asking for a string literal that lacks escape sequences so you don't need to escape backslashes, but also ignores linebreaks, and AFAIK that's just not a thing that exists (unless someone wrote a crate for it). You're probably better off breaking your regex into multiple string literals and concatenating them.

krdln · May 3, 2020, 12:39pm

Your first regex is already using raw string literal (r#"…"#) and as you say, it works.

Perhaps what you want is to enable "insignificant whitespace mode" in the regex library (r#"(?x)…"). Note that spaces in eg. What is won't work, and you'd need to escape them with \x20.

Alternatively, you can call Regex::new(r#"…".replace('\n', "") to quickly get rid of just newlines. This sounds like a very hacky solution, though.

hyousef · May 3, 2020, 12:44pm

I tried below but did not work:

    let mut re_str = String::from(r#"
                (?i)(What is|What's|Calculate|How much is)
                \s(\d)
                \s(\+|and|plus|\-|less|minus|x|by|multiplied by|/|over|divided by)
                \s(\d)
            "#); //
    re_str.retain(|c| !c.is_whitespace());

    let re = Regex::new(re_str.as_str()).unwrap();
    let matching = re.is_match(command);
    
    println!("re: {:#?}\n Command: {}\n matching: {}", re, command, matching);

ALso this did not work:

    println!("command {}", command);
    let re_str: String = String::from(r#"
                (?i)(What is|What's|Calculate|How much is)
                \s(\d)
                \s(\+|and|plus|\-|less|minus|x|by|multiplied by|/|over|divided by)
                \s(\d)
            "#).chars().filter(|c| !c.is_whitespace()).collect();

    let re = Regex::new(re_str.as_str()).unwrap();
    let matching = re.is_match(command);
    
    println!("re: {:#?}\n Command: {}\n matching: {}", re, command, matching);

BurntSushi · May 3, 2020, 1:37pm

@hyousef You need to use x, as suggested above. This works for me:

use regex::Regex;

fn main() {
    let command = "what is 1 + 2";
    let re = Regex::new(r#"(?xi)
        (What\ is|What's|Calculate|How\ much\ is)
        \s(\d)
        \s(\+|and|plus|\-|less|minus|x|by|multiplied\ by|/|over|divided\ by)
        \s(\d)
    "#).unwrap();
    let matching = re.is_match(command);

    println!("re: {:#?}\n Command: {}\n matching: {}", re, command, matching);

    for caps in re.captures(command) {
        println!("groups: {} {} {} {}",
            caps.get(1).unwrap().as_str(),
            caps.get(2).unwrap().as_str(),
            caps.get(3).unwrap().as_str(),
            caps.get(4).unwrap().as_str()
        );
    }
}

Note that you must escape whitespace when in x mode that you want to be significant. Notice that I wrote What\ is instead of What is.

hyousef · May 3, 2020, 5:42pm

Thanks, or is fine now, but have slight clarification
What is the different between escaping space or escaping s, i.e. '\ ' and '\s'

TomP · May 3, 2020, 6:42pm

'\ ' greps for the UTF-8 code-point 0x20, while '\s'greps for any character of the class :space:, including tabs, form-feeds, non-ASCII spaces in other languages, etc.

system · August 1, 2020, 6:42pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Escape speech marks in regex help	5	8822	January 12, 2023
Rust string literal definition as Regexp help	6	1926	June 6, 2021
Expected type, found `"(\\d{3})"` help	7	872	October 5, 2021
Convert non-regex string into regex help	2	436	February 5, 2023
Regex with a newline character help	4	3672	April 10, 2022

How to use r#"" in RegEx

Related topics