Optional regexes populating a struct


#1

I am reading in some events, and want to place the matches I find into a struct. Some of these will be optional, as one or more of those fields may not be present in the message.

I have used the regex crate to grab the matches, and generated some code to handle placement of the Some(data) when it exists, however if my regex doesn’t match precisely then it bails out at the captures stage.

I am eventually going to have 30-40 possible fields per message, so it feels like it is quickly going to get clunky using my approach below. I looked at the RegexSet, but I couldn’t see how the named matches might be exposed. Does anyone have suggestions from improving this? Would nom be better?

Thanks.

extern crate regex;
use regex::Regex;

#[derive(Debug)]
struct MyEvent {
    etyp: String,
    arch: String,
    idno: Option<u32>,
}

fn main() {

    let myevent: &str = "etyp=BLEG arch=amd64 idno=4";

    let re = Regex::new(r"etyp=(?P<etyp>\S+?)?\sarch=(?P<arch>\S+?)?\sidno=(?P<idno>\d+?)?\sjunk=(?P<junk>\d+?)?").unwrap();
    let cap = match re.captures(&myevent) {
        Some(data) => data,
        None => panic!("Regex capture problem")
    };

    let foo = MyEvent {
        etyp: cap.name("etyp").unwrap().parse::<String>().unwrap(),
        arch: cap.name("arch").unwrap().parse::<String>().unwrap(),

        idno: match cap.name("idno") {
            Some(data) => Some(match data.parse::<u32>() {
                Ok(data) => data,
                Err(err) => panic!("died in u32 parse")
            }),
            None => None
        }

    };

    println!("{:#?}", foo);

}

#2

If the grammar is as simple as it seems, I would abandon regexpes and instead split on whitespace and then on = sign.


#3

Thank you, but I perhaps oversimplified my example string. Some of the fields may have quoted strings containing spaces (oper=“op: 1 error 3D”). I had anticipated setting one capture group per style of field, and then doing a subsequent parse of that field to extract sub-information.


#4

Yes, I think in this case you’ll benefit from separate “lexing” and “parsing” stages.