Using lazy_static! with regex in nom's re_match

I am having trouble using a regular expression in nom's (6.0) re_match parser. This works fine:

fn ident_parser(input: &str) -> IResult<&str, bool> {
  let re: Regex = Regex::new("[A-Za-z0-9_:.-]+").unwrap();
  match tuple((space0, re_match(re))) {
    . . . 

When I try to move the regular expression into a lazy_static! like so:

lazy_static! {
    static ref RE: Regex = Regex::new("[A-Za-z0-9_:.-]+").unwrap();
}

I get an error that implies that it's not actually being seen as a Regex type. Or, if I try to dereference it, I can't seem to get the correct combination of dereference operators.

re_match(RE)
         ^^ expected struct `regex::Regex`, found struct `parsers::variables::RE`

I saw a reference in an article that said something like "lazy_static actually lies about its type" and recommended something like &*RE but I haven't found a combination that works. Either the error refers to it by the module name or complains that it doesn't implement Copy.

re_match(*RE)
         ^^^ move occurs because value has type `regex::Regex`, which does not implement the `Copy` trait

re_match(&*RE)
         ^^^^
         |
         expected struct `regex::Regex`, found `&regex::Regex`
         help: consider removing the borrow: `*RE`

One slightly related question... if I'm not in a loop and my input to the Regex::new is a static string is do I really need to put it in a lazy_static! block anyway? Or will the compiler recognize that it is fixed and do the optimizations for me?

I would just reimplement re_match, ideally it should have been

pub fn re_match<'a, R, E>(re: R) -> impl Fn(&'a str) -> IResult<&'a str, &'a str, E>
  where
    R: Borrow<Regex>,
    E: ParseError<&'a str>,
{
    use nom::traits::{InputLength, Slice};
    move |i| {
      if re.borrow().is_match(i) {
        Ok((i.slice(i.input_len()..), i))
      } else {
        Err(Err::Error(E::from_error_kind(i, ErrorKind::RegexpMatch)))
      }
    }
}

So that you could either borrow or pass in an owned regex, but alas it's not.

1 Like

Thank you. I feel like the difference between the nom implementation of re_match and yours would be a great beginners tutorial. If I fully understood what is going on I'd definitely be leveling up my rust.

I tried implementing it in my module (as fn bre_match) and it can't be used there because the nom::traits is a private module in the nom library? That's a correct interpretation of the following error?

27  |     use nom::traits::{InputLength, Slice};
    |              ^^^^^^ private module

P.S. Is there any difference between your signature:

pub fn re_match<'a, R, E>(re: R) -> impl Fn(&'a str) -> IResult<&'a str, &'a str, E>
  where
    R: Borrow<Regex>,
    E: ParseError<&'a str>,
{

and

pub fn re_match<'a, E>(re: &Regex) -> impl Fn(&'a str) -> IResult<&'a str, &'a str, E>
  where
    E: ParseError<&'a str>,
{

Yeah, I just straight copied the implementation from the source, I should have been more careful, if you remove the ::traits in the use statement it should work.

use nom::{InputLength, Slice};

You right! In case anyone else finds this answer, here's an explanation.

The only difference between the two implementations is the introduction of the generic parameter R: Borrow<Regex>. This is helpful because Borrow let's you abstract over borrowing a value through two implementations:

impl<T> Borrow<T> for T { ... }
impl<T> Borrow<T> for &T { ... }

note: these are not exactly the implementations, just simplified a bit for the example

This states that

  1. any type T can be borrowed as a T, just the obvious identity
  2. any type &T (borrow of T) can be borrowed as T, the same as reborrowing (&*t)

So it let's you pass in either a Regex or a &Regex whichever you have.

2 Likes

Unrelated question: what is the i in move |i|? In other words, what is being owned (if that's the correct terminology) here?

I think it's something like: "re_match is a function that takes a regular expression and returns a function that takes a string and returns an IResult" ... and the i is somehow the &'a str that is passed into that returned function?

Edit:

After rereading, I think that's it...

impl Fn(&'a str) -> IResult<&'a str, &'a str, E> is being implemented by

    move |i| {
      if re.borrow().is_match(i) {
        Ok((i.slice(i.input_len()..), i))
      } else {
        Err(Err::Error(E::from_error_kind(i, ErrorKind::RegexpMatch)))
      }
    }

I was thinking that was returning IResult but it's actually returning a Fn(&'a str) -> IResult<&'s str, &'a str, E>

That's closure syntax, you can read about it in detail in my article on closures

Exactly

1 Like

I created a PR to get this into nom

https://github.com/Geal/nom/pull/1265

4 Likes

Note that also, because nom::re_match returns impl Fn(...) - > ..., you can lazy_static!/store the parser itself, rather than lazy_static!ing the Regex and calling re_match multiple times with a reference to the static regex. It doesn't help you much if you want to use the regex elsewhere, but it is an option.

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.