Regex with a newline character

I'm struggling to figure out how to capture a string which breaks across two lines. I've put a sample on the Playground. If you delete the \n character and have the string on the same line, it works. But with the \n it doesn't work. I've added the r"(?s) so it should in theory treat a \n as a character. :frowning:

Any help would be greatly appreciated.

use regex::Regex;

fn main() {
    let msting = " 24.
    Kh1 Qg2# {+1000.01/28}";
    let rxmove = Regex::new(r"(?s)[[:space:]]([[:digit:]]+[.\n.?])[[:space:]]([[:word:]]+)[[:space:]]([[:word:]]+)?").unwrap();

        for cap in rxmove.captures_iter(msting).into_iter()
        {
            println!("One: {} Two: {} Three: {}", &cap[1], &cap[2], &cap[3]);
        }
}


(Playground)

Errors:

   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 1.52s
     Running `target/debug/playground`

Your problem seems to be the spaces rather than the line-break. Changing it to

use regex::Regex;

fn main() {
    let msting = " 24.
Kh1 Qg2# {+1000.01/28}";
    let rxmove = Regex::new(r"(?s)[[:space:]]([[:digit:]]+[.\n.?])[[:space:]]([[:word:]]+)[[:space:]]([[:word:]]+)?").unwrap();

        for cap in rxmove.captures_iter(msting).into_iter()
        {
            println!("One: {} Two: {} Three: {}", &cap[1], &cap[2], &cap[3]);
        }
}

seems to make it work. I haven't tried understanding the regex yet.

Yes, the spaces are the problem. I wasn't being "greedy" enough when checking for spaces. I've changed it to:

let rxmove = Regex::new(r"(?s)[[:space:]](\d+[.]+)\s+?(\w+)\s+?(\w+)").unwrap();

And it works in the playground, but in my actual code it will not pickup the last line from the file, which this is. So I have other problems I think.

(?s)[[:space:]]([[:digit:]]+[.\n.?])[[:space:]]([[:word:]]+)[[:space:]]([[:word:]]+)?").unwrap();`
^^^^ <-- 1                2 --> ^^  ^^^^^^^^^^^ <-- 3
  1. You're not using the . metacharacter, so you don't need this. . in a character class is just a literal . (and you have \n in the character class anyway)
  2. No need to have . twice in a character class. Or, if you meant "maybe any character", note that . and ? are both literal within a character class. Maybe you meant [.].? (with (?s))? Unclear.
  3. You only allow for a single whitespace character between the first capture and the second capture. Maybe you meant [[:space:]]* or [[:space:]]+ to allow 0 or more, or 1 or more.

If you're purposefully only allowing one space, probably it's your string that's the problem, like @steffahn said.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.