Regular expressions, the Rust Cookbook, and lazy_static!

I'm starting a new Rust program where I will need regular expressions. I assume the regex crate is the way to go?

Reading the Rust Cookbook here, I get an example using lazy_static:


fn extract_login(input: &str) -> Option<&str> {
    lazy_static! {
        static ref RE: Regex = Regex::new(r"(?x)
            ^(?P<login>[^@\s]+)@
            ([[:word:]]+\.)*
            [[:word:]]+$
            ").unwrap();
    }
    RE.captures(input).and_then(|cap| {
        cap.name("login").map(|login| login.as_str())
    })
}

If I understand right, then lazy_static is considered outdated, right? It has first been superseeded by once_cell and then found it's way into std::sync::OnceLock with Rust version 1.70, which was released on June 1st, 2023. This is also explained in the once_cell F.A.Q.:

Should I use std::cell::OnceCell, once_cell, or lazy_static?

If you can use std version (your MSRV is at least 1.70, and you don’t need extra features once_cell provides), use std. Otherwise, use once_cell. Don’t use lazy_static.

It explicitly says, "Don't use lazy_static". So I wonder if the idiomatic way to use regular expressions would then be as follows:

use regex::Regex;

use std::sync::OnceLock;

fn extract_login(input: &str) -> Option<&str> {
    static RE: OnceLock<Regex> = OnceLock::new();
    let re = RE.get_or_init(|| Regex::new(r"(?x)
        ^(?P<login>[^@\s]+)@
        ([[:word:]]+\.)*
        [[:word:]]+$
        ").unwrap()
    );
    re.captures(input).and_then(|cap| {
        cap.name("login").map(|login| login.as_str())
    })
}

(Playground)

Is that correct?

Also, should the Rust Cookbook be updated?

The chapter hasn't been updated for three years, so definitely needs a refresh.

I'd rather once_cell or std can provide a lazy_static! macro as a replacement.

Compared to other languages, we indeed needs to write more verbose and explicit code in Rust, which is less ergonomic.

I see that it's also possible to use the yet-unstable LazyLock, which may come in handy if a regular expression is shared by several functions:

#![feature(lazy_cell)]

use regex::Regex;
use std::sync::LazyLock;

static RE_LOGIN: LazyLock<Regex> = LazyLock::new(|| Regex::new(r"(?x)
    ^(?P<login>[^@\s]+)@
    ([[:word:]]+\.)*
    [[:word:]]+$
    ").unwrap()
);

fn extract_login(input: &str) -> Option<&str> {
    RE_LOGIN.captures(input).and_then(|cap| {
        cap.name("login").map(|login| login.as_str())
    })
}

(Playground)

On stable, you can work-around that like this, I guess:

-use std::sync::LazyLock;
+use once_cell::sync::Lazy as LazyLock;

(Playground)

But since the Cookbook is seemingly outdated, is regex still up to date and/or the crate to use, usually?

I see it's been recently updated and it's co-owned by github:rust-lang:libs.

Yeah, LazyLock is great, and I've already created a lazy_static! macro for it for convenience in my crates, and old code doesn't need an update (and only removed lazy_static dep).

#[macro_export]
macro_rules! lazy_static {
    ($( $(#[$a:meta])* $v:vis static ref $i:ident : $t:ty = $e:expr ; )+) => {
        $(
            $(#[$a])* $v static $i: ::std::sync::LazyLock<$t> = ::std::sync::LazyLock::new(|| $e);
        )+
    };
}

For your case: Rust Playground

Yes. Announcing regex 1.9 | Rust Blog

3 Likes

The Lazy impl using OnceLock is fortunately trivial:

struct Lazy<T> {
    once: OnceLock<T>,
    init: fn() -> T,
}

impl<T> Lazy<T> {
    const fn new(init: fn() -> T) -> Self {
        Lazy { once: OnceLock::new(), init }
    }
}

impl<T> Deref for Lazy<T> {
    type Target = T;

    fn deref(&self) -> &T {
        self.once.get_or_init(self.init)
    }
}

I never really understood why people want a macro for lazy_static; the static ref hiding the type is really not helpful, only confusing, and creating a closure shouldn't be considered so hard/long/noisy as to be undesirable (or even worth creating a macro for).

3 Likes

I recently came across the crate lazy_regex - Rust which seems very nice. It offers a macro for creating a static lazy regex conveniently, and even reports syntax errors (in the regex) at compile-time (all using the ordinary regex crate as a dependency).

use lazy_regex::regex;

fn extract_login(input: &str) -> Option<&str> {
    regex!(
        r"(?x)
        ^(?P<login>[^@\s]+)@
        ([[:word:]]+\.)*
        [[:word:]]+$
        "
    )
    .captures(input)
    .and_then(|cap| cap.name("login").map(|login| login.as_str()))
}
5 Likes