Rust RegEx-Question - match does match an expression that should not match at all

Hi all,

I crafted some regex ^[+-]{0,1}P([0-9]{1,}W)|([0-9]{1,}D){0,1}([0-9]{1,}H){0,1}([0-9]{1,}M){0,1}([0-9]{1,}S){0,1} that should check iCal's 3.3.6. Duration iCalendar.org - 3.3.6. Duration

I checked ^[+-]{0,1}P([0-9]{1,}W)|([0-9]{1,}D){0,1}([0-9]{1,}H){0,1}([0-9]{1,}M){0,1}([0-9]{1,}S){0,1} on https://regex101.com and there it does do as I expected and does not match ~P7W, p7W or P7w.

When I use Rust's RegEx regex - Rust it changes:

use regex::Regex;
fn main() {
        let r = Regex::new(r"^[+-]{0,1}P([0-9]{1,}W)|([0-9]{1,}D){0,1}([0-9]{1,}H){0,1}([0-9]{1,}M){0,1}([0-9]{1,}S){0,1}").unwrap();
        assert!(r.is_match("+P15DT5H0M20S"));
        assert!(r.is_match("-P15DT5H0M20S"));
        assert!(r.is_match("P15DT5H0M20S"));
        assert!(r.is_match("+P7W"));
        assert!(r.is_match("-P7W"));
        assert!(r.is_match("P7W"));
        // no fail?!
        assert!(!r.is_match("~P7W"));
        assert!(!r.is_match("p7W"));
        assert!(!r.is_match("P7w"));
}

Exited with status 101

Standard Error

   Compiling playground v0.0.1 (/playground)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.35s
     Running `target/debug/playground`
thread 'main' panicked at src/main.rs:11:9:
assertion failed: !r.is_match("~P7W")

Either my RexEx is wrong or I do not use Rust's RegEx correctly.

Can pls anybody give me a hint what is wrong here?

I'm not familiar with the iCalendar format, so I just went ahead an fixed the regex to conform to what I believe the format is, based on your asserts:

use regex::Regex;
fn main() {
    let r = Regex::new(r"^[+-]?P(([0-9]+W)|(([0-9]+D)?T([0-9]+H)?([0-9]+M)?([0-9]+S)?))$").unwrap();
    assert!(r.is_match("+P15DT5H0M20S"));
    assert!(r.is_match("-P15DT5H0M20S"));
    assert!(r.is_match("P15DT5H0M20S"));
    assert!(r.is_match("+P7W"));
    assert!(r.is_match("-P7W"));
    assert!(r.is_match("P7W"));
    // no fail?!
    assert!(!r.is_match("~P7W"));
    assert!(!r.is_match("p7W"));
    assert!(!r.is_match("P7w"));
}

Playground.

3 Likes

I note that your regex on regex101 does match: regex101: build, test, and debug regex

It's subtle, but you can see the match highlights. The match isn't of the entire string, but rather, of the empty string between each character. This is because your regex is basically of this format: ^P([0-9]+W)|([0-9]+D)?. The | operator has low precedence, so the ^P is only in the first branch. The second branch doesn't have them at all, and indeed, can match the empty string. You can test this with re.is_match("").

@jofas fixed this by overriding the precedence of | with parentheses, e.g., ^P(?:([0-9]+W)|([0-9]+D)?.

With all that said, the revised regex still matches PT, which is probably not valid. You can fix that with more branches in your regex, but it will be a much bigger regex. Alternatively, use a second regex, which would keep your regexes simpler but will be a little more kludgy overall.

6 Likes

Indeed :slightly_smiling_face:

use regex::Regex;
fn main() {
    let r = Regex::new(r"^[+-]?P(([0-9]+W)|(([0-9]+D)?T(([0-9]+H([0-9]+M)?([0-9]+S)?)|(([0-9]+H)?[0-9]+M([0-9]+S)?)|(([0-9]+H)?([0-9]+M)?[0-9]+S))))$").unwrap();
    assert!(r.is_match("+P15DT5H0M20S"));
    assert!(r.is_match("-P15DT5H0M20S"));
    assert!(r.is_match("P15DT5H0M20S"));
    assert!(r.is_match("+P7W"));
    assert!(r.is_match("-P7W"));
    assert!(r.is_match("P7W"));
    assert!(r.is_match("PT2H"));
    assert!(r.is_match("PT2M"));
    assert!(r.is_match("PT2S"));
    assert!(r.is_match("PT2H2M"));
    assert!(r.is_match("PT2H2S"));
    assert!(r.is_match("PT2M2S"));
    
    assert!(!r.is_match("PT"));
    assert!(!r.is_match("~P7W"));
    assert!(!r.is_match("p7W"));
    assert!(!r.is_match("P7w"));
}

Playground.

4 Likes