Scanning a logfile using scan_rules::scanner

I've been looking for a good way to parse a logfile and thought I'd found it with the scan_rules::scanner macros. I like the regex feature and started using that, as below, but find that if the pattern doesn't match let_scan! panics which makes it difficult to handle a mixture of line formats or skip corrupted lines.

I tried panic::catch_unwind as you can see, but the console still gets the panic messages which is not acceptable.

This seems to make the scan_rules::scanner unsuitable for production, so is there a better way to do the following?

///! Parse a line of the form:
///!    INFO 2020-07-08T19:58:26.841778689+01:00 [src/bin/safe_vault.rs:114]
///!    WARN 2020-07-08T19:59:18.540118366+01:00 [src/data_handler/idata_handler.rs:744] 552f45..: Failed to get holders metadata from DB
  fn parse_info_line(line: &str) -> Option<LogEntry> {
    use scan_rules::scanner::{
      exact_width_a,
      re_str, // runtime scanners
      NonSpace,
    };
    use std::panic;

    let scanned = panic::catch_unwind(|| {
      let_scan!(line; (
        let category <| re_str(r"^[A-Z]{1,4}"),
        let time_string <| exact_width_a::<NonSpace>(35),
        let remainder <| re_str(r".*")
      ));

      let parser_debug = format!(
        "category: {}, time: {}, rem: {}",
        category, time_string, remainder
      );

      LogEntry {
        logstring: String::from(line),
        category: String::from(category),
        time: None,
        parser_debug,
      }
    });
    match scanned {
      Ok(log_entry) => Some(log_entry),
      Err(_) => None,
    }
  }
}

Note: my logfile contains lines with a different format to the examples in the comment, so I need to parse different line formats.

I'd create an issue against the project's repository. Usually functions that may fail should return a Result or Option so the caller can choose how to deal with failure. Needing to use std::panic::catch_unwind() as a try-catch mechanism a bit hacky and I would avoid it.

Have you thought of using a normal parser library? Something like nom would let you define a bunch of little parsers for each log pattern. As a bonus nom is considerably more popular and battle-tested, so you'll be able to find more examples and tutorials to build on.

1 Like

Thanks Michael, I'll take a look at nom. In the mean time I found a decent solution using Regex:

///! Parse a line of the form:
///!    INFO 2020-07-08T19:58:26.841778689+01:00 [src/bin/safe_vault.rs:114]
///!    WARN 2020-07-08T19:59:18.540118366+01:00 [src/data_handler/idata_handler.rs:744] 552f45..: Failed to get holders metadata from DB
fn parse_info_line(line: &str) -> Option<LogEntry> {
  use regex::Regex;
  match Regex::new(r"(?P<category>^[A-Z]{4}) (?P<time_string>[^ ]{35}) (?P<remainder>.*)") {
    Err(_) => None,
    Ok(re) => match re.captures(line) {
      None => None,
      Some(captures) => {
        let category = captures.name("category").map_or("", |m| m.as_str());
        let time_string = captures.name("time_string").map_or("", |m| m.as_str());
        let remainder = captures.name("remainder").map_or("", |m| m.as_str());

        let parser_debug = format!(
          "category: {}, time: {}, rem: {}",
          category, time_string, remainder
        );

        Some(LogEntry {
          logstring: String::from(line),
          category: String::from(category),
          time: None,
          parser_debug,
        })
      }
    },
  }
}

Any tweaks to make it more idiomatic appreciated as I'm very new to Rust.

If you are scanning through thousands of lines you probably don't want to recompile the regex (Regex::new()) every time. One approach people use is to use something like lazy_static to store it in a static immutable variable which will only run the initializer code on first use.

use lazy_static;
use regex::Regex;

lazy_static::lazy_static! {
    static ref LOG_LINE_PATTERN: Regex = Regex::new(r"(?P<category>^[A-Z]{4}) (?P<time_string>[^ ]{35}) (?P<remainder>.*)").expect("The regex failed to compile. This is a bug.");
}

You'll need to use Regex::new(...).unwrap(), but it's fine to blow up in this case. You'll be testing the code, so will notice pretty quickly if the regex doesn't compile.

Another thing is that if your function returns Option<T> and a function you call returns an Option, you can use the ? operator to bail early on the None case.

Those two tricks help to remove your rightward drift and keep the logic fairly straightforward.

struct LogEntry {
    logstring: String,
    category: String,
    time: Option<()>,
    parser_debug: String,
}

///! Parse a line of the form:
///!    INFO 2020-07-08T19:58:26.841778689+01:00 [src/bin/safe_vault.rs:114]
///!    WARN 2020-07-08T19:59:18.540118366+01:00 [src/data_handler/idata_handler.rs:744] 552f45..: Failed to get holders metadata from DB
fn parse_info_line(line: &str) -> Option<LogEntry> {
    let captures = LOG_LINE_PATTERN.captures(line)?;

    let category = captures.name("category").map_or("", |m| m.as_str());

    let time_string = captures.name("time_string").map_or("", |m| m.as_str());
    let remainder = captures.name("remainder").map_or("", |m| m.as_str());

    let parser_debug = format!(
        "category: {}, time: {}, rem: {}",
        category, time_string, remainder
    );

    Some(LogEntry {
        logstring: String::from(line),
        category: String::from(category),
        time: None,
        parser_debug,
    })
}

(playground)

1 Like

Thanks again Michael, I appreciate you taking the time to give me these tips.

EDIT:

Done.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.