Regex::bytes problem


#1

I think there is a problem between my keyboard and chair.

If I run this:

extern crate regex;
use regex::bytes::Regex;

fn main() {

    let vec = vec![ 0x57, 0x4A, 0xCC ];
    let re: Regex = Regex::new( r"\x57\x4A").unwrap();
    if re.is_match(&vec) {
        println!("match");
    }
}

the output is this:
match

If I change the regex to r"\x57\x4A\xCC" there is no match.

Any help appreciated.


#2

I think you still need to explicitly disable Unicode support, (?-u), to match a byte that would be invalid UTF-8. From the docs:

https://docs.rs/regex/1.0.5/regex/bytes/index.html#syntax

  1. Hexadecimal notation can be used to specify arbitrary bytes instead of Unicode codepoints. For example, in ASCII compatible mode, \xFF matches the literal byte \xFF , while in Unicode mode, \xFF is a Unicode codepoint that matches its UTF-8 encoding of \xC3\xBF . Similarly for octal notation when enabled.

#3

You are right.

    let re: Regex = Regex::new( r"(?-u)\x57\x4A\xCC").unwrap();

does it.

Great, thanks a lot.