If let {} else if let {} else {} - Better way to structure code?

Hi!

Just started into learning rust, and I'm very thrilled :slight_smile:

I currently writing a small post-consume script for paperless-ngx.

This involves to parse a string and match a German or an ISO date at the beginning of a filename:

let example_with_iso_date = "2012-03-12 - dummy document.pdf";
let example_with_german_date = "23.01.2023 - my cv.pdf";

I'd like to use a regex and named capture groups. As it appears, it is not possible to combine the two patterns into a single regex using or ("|"), because the regex crate complains about duplicated names.

So my task is to:

  1. check if first regex matches, if yes, use the parts for the date
  2. else check if second regex matches, if yes, use parts for the date
  3. else nothing to do

The following code works, but I believe there must be a better / more idiomatic way to do this.
Especially the variable declared outside of the "if let" being set inside feels ugly.
If there was only a single regex, I'd go for "let else" and return if there is no match.

use regex::{Captures, Regex};

fn match_date_at_start(filename: &str) {
    let date_parts: Captures;

    // match title for ISO date
    let iso_date_pattern =
        Regex::new(r"^(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})\s*-?\s*").unwrap();

    // match title for German
    let german_date_pattern =
        Regex::new(r"^(?<day>[0-9]{2})\.(?<month>[0-9]{2})\.(?<year>[0-9]{4})\s*-?\s*").unwrap();

    if let Some(matches) = iso_date_pattern.captures(filename) {
        date_parts = matches;
    } else if let Some(matches) = german_date_pattern.captures(filename) {
        date_parts = matches;
    } else {
        println!("no date match found - nothing to do");
        return;
   }
   
   // nice, so do something with the date parts
   println!(
        "here is the iso date version of whatever was matched: {}-{}-{}",
        &date_parts["year"],
        &date_parts["month"],
        &date_parts["day"]
    );
}

fn main() {
    match_date_at_start("2012-03-17 - my file.pdf");
    match_date_at_start("23.02.2023 - my file other file.pdf");
    match_date_at_start("no date here.pdf");
}

(Playground)

Output:

here is the iso date version of whatever was matched: 2012-03-17
here is the iso date version of whatever was matched: 2023-02-23
no date match found - nothing to do

Errors:

   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 1.08s
     Running `target/debug/playground`

Any help is very much appreciated :slight_smile:

Greetings and have a nice Christmas!

An if is an expression that can be assigned to, as in let date_parts = if ..., and branches of the if can still return out of the function (or break out of a loop).

3 Likes

Greetings! :smile:

You could use a list (an array, or a vector) to hold all the regexes, and iterate on that. This way, when adding support for a new date format, just add a new regex into the list, and the rest of the code will automatically use it. Of course, it would be best if you could determine which regex to use and quickly find the right regex without iterating on each one.

Also, you could use the lazy_static crate, so you build the regexes only once.

Here is the modified code demonstrating this:

use lazy_static;
use regex::Regex; // 1.4.0

lazy_static::lazy_static! {
    static ref PATTERNS:[Regex; 2] = [
        // match title for ISO date
        Regex::new(r"^(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})\s*-?\s*").unwrap(),
        // match title for German
        Regex::new(r"^(?<day>[0-9]{2})\.(?<month>[0-9]{2})\.(?<year>[0-9]{4})\s*-?\s*").unwrap(),
    ];
}

fn match_date_at_start(filename: &str) {
    let matches = PATTERNS
        .iter()
        .find_map(|pattern| pattern.captures(filename));

    let Some(date_parts) = matches else {
        println!("no date match found - nothing to do");
        return;
    };

    // nice, so do something with the date parts
    println!(
        "here is the iso date version of whatever was matched: {}-{}-{}",
        &date_parts["year"], &date_parts["month"], &date_parts["day"]
    );
}

fn main() {
    match_date_at_start("2012-03-17 - my file.pdf");
    match_date_at_start("23.02.2023 - my file other file.pdf");
    match_date_at_start("no date here.pdf");
}

(Playground)

Key differences: iterates on array of regexes, find_map, let-else, lazy_static crate

EDIT: In the same topic, see also:

10 Likes

Another way to do this is or_else and let-else.

let Some(date_parts) = iso_date_pattern
    .captures(filename)
    .or_else(|| german_date_pattern.captures(filename))
else {
    println!("no date match found - nothing to do");
    return;
};
5 Likes

Hi!

Thank you very much (all of you)!

To me, the suggestion using the „single“ let-else looks a bit cleaner.
I’ve read about lazy_static - good to know of these kinds of optimizations.

In my opinion using let-else and chaining with .or_else feels a bit strange, because it kind of mixes „regular“ code with a lambda just for the sake of adding another branch.

I’ll try assigning the if directly as well - need it
in front of my eyes to let it sink in :slight_smile:

Thanks for your input!

I feel a little refactoring can go a long way.

3 Likes

I like the refactoring + OnceLock + arrays:

use std::sync::OnceLock;

fn find_date_at_start(filename: &str) -> Option<Captures<'_>> {
    const PATTERN_STRS: [&str; 2] = [
        r"^(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})\s*-?\s*",
        r"^(?<day>[0-9]{2})\.(?<month>[0-9]{2})\.(?<year>[0-9]{4})\s*-?\s*",
    ];
    static PATTERNS: OnceLock<[Regex; PATTERN_STRS.len()]> =
        OnceLock::new();

    let patterns = PATTERNS
        .get_or_init(|| PATTERN_STRS.map(|s| Regex::new(s).unwrap()));

    patterns.iter().find_map(|p| p.captures(filename))
}
1 Like

Thanks!

OnceLock looks very interesting!

Not that it matters for my use case, but what happens to the get_or_init() call at runtime?
First it does init - and on subsequent calls it should just be a no-op? Or is there still the cost of a function call?

Greetings!

This is pretty similar to lazy_static in principle. A memory slot yet to be initialized is guarded behind a Once primitive.

Each time the inner value is accessed, a call to the call_once method is performed. (Or call_once_force, for that matter.)

Subsequent calls, can’t be complete no-ops. I don’t know for all platforms, but typically, this is protected by atomic reads and writes. First, an atomic read is performed to check if the value is already initialized. If it’s not, a lock is acquired by one of the calling thread, the routine to initialize the value is run and other threads which were trying to access the value are blocked until the routine is completed. If it is already initialized, the value is immediately returned. So, once the value has been initialized, there is a single atomic read and no thread will be blocked anymore. There is a small synchronization cost, but it’s minimal. (Atomic reads are very efficient operations and rarely a big deal.)

The difference with lazy_static is that:

  • OnceLock doesn’t have to be stored in a static variable. It’s possible to put the value anywhere you want (heap or stack), and for a non-'static lifetime (the value doesn’t have to live for the whole duration of the program) if you want. For this reason, OnceLock is more flexible.
  • lazy_static defines a wrapper type implementing Deref targeting the inner value, so that it’s possible to use the static item as if it was the inner value. In fact, each call to deref will call a function which does something akin to get_or_init under the hoods.

If targeting a 1.70 or newer Rust compiler, you can generally use OnceLock instead of lazy_static, because there is nothing you can do with lazy_static that you can’t with OnceLock, and OnceLock is part of the standard library, so that’s one dependency less in your graph. You could decide otherwise if you prefer the "static variable and initialization code at definition"-like ergonomic though.

(To tell the truth, I had forgotten about OnceLock when I sent my first answer, but I would have used it instead.)

4 Likes

Hi!

Thanks for all the information! :slight_smile:

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.