Idiomatic way to avoid match

Another idea of unstable features that would help deduplicate even more of the code: array’s try_map method:

#![feature(let_chains)]
#![feature(array_try_map)]

pub fn compare_paths_for_size1(path_a: &str, path_b: &str) -> Ordering {
    if let Ok([size_a, size_b]) = [path_a, path_b].try_map(file_info_size)
    && let Some(val) = size_a.partial_cmp(&size_b) {
        val
    } else {
        Ordering::Equal
    }
}

If you don’t use partial_cmp but cmp, it even becomes only dependent on the array_try_map.

#![feature(array_try_map)]

pub fn compare_paths_for_size1(path_a: &str, path_b: &str) -> Ordering {
    if let Ok([size_a, size_b]) = [path_a, path_b].try_map(file_info_size) {
        size_a.cmp(&size_b)
    } else {
        Ordering::Equal
    }
}
5 Likes

I found that using if let(...) sometimes is less convenient, because I still may need the None case, and is hardly applicable for Result unpacking.

An ugly but working solution, that I saw in a crate: nest matches. The example is made up.

let result = match match match input_var {
    Some(x) => x,
    None => panic!(),
} {
    Ok(VariantOne(y)) => y,
    Ok(VariantTwo(y)) => panic!(), // or return, or continue
    Err(_) => MakeSomethingUp(123),
} {
    VariantX(i) if i > 10 => do_a(),
    VarinatX(i) if i < 10 => do_b()
};

I found it useful for complicated unpacking where I needed to contitue or return in some earlier unpacking branches, and splitting it in multiple match clauses reduced repetition.

2 Likes

TBH, I'd just do this:

pub fn compare_paths_for_size(path_a: &str, path_b: &str) -> Result<Ordering, std::io::Error> {
    let size_a = file_info_size(path_a)?;
    let size_b = file_info_size(path_b)?;
    Ok(u64::cmp(&size_a, &size_b))
}

People can always .unwrap_or(Ordering::Equal) if they want, but one might as well tell the caller if you couldn't compare things for some reason.

23 Likes

Without avoiding match but matching on a tuple directly:

fn foo(path_a: &str, path_b: &str) -> Ordering {
    match (file_info_size(path_a), file_info_size(path_b)) {
        (Ok(size_a), Ok(size_b)) => len_a.cmp(&len_b), // u64 is Ord not only PartialOrd
        _ => Ordering::Equal,
    }
}

Though I prefer @scottmcm solution.

7 Likes

Phew, thought I was losing my mind. The OP has a simple question about a simple task and there comes a dozen solutions that make it unfathomable, employing all manner of obscure techniques.

Actually I thought the OP's code was suffering from unreadability as well. To many nesting levels for the sake of only having a single exit point. That would look messy even in C. Better to have early returns. See below.

How about a step further. Get rid of the redundant file_size_info() function:

pub fn compare_paths_for_size(path_a: &str, path_b: &str) -> Result<Ordering, std::io::Error> {
    let size_a = std::fs::metadata(path_a)?.len();
    let size_b = std::fs::metadata(path_b)?.len();
    Ok(size_a.cmp(&size_b))
}

I would argue that returning the error in this case is a perfectly idiomatic way of avoiding the match.

I wonder about the motivation of the OP's code. It returns Ordering::Equal even if one file is missing!

9 Likes

Anyhow, now that we have cleared away all the clutter, made the function readable and reduced it to some kind of minimum why not add some nice error checking, so we know which file fails to be read for example:

use anyhow::{Context, Result};
use std::cmp::Ordering;

pub fn compare_file_sizes(path_a: &str, path_b: &str) -> Result<Ordering> {
    let size_a = std::fs::metadata(path_a)
        .with_context(|| format!("Failed to read metadata for {}", path_a))?
        .len();
    let size_b = std::fs::metadata(path_b)
        .with_context(|| format!("Failed to read metadata for {}", path_b))?
        .len();
    Ok(size_a.cmp(&size_b))
}

fn main() -> Result<()> {
    let ordering = compare_file_sizes("path_a", "path_b")?;

    // Do something with file size ordering...

    Ok(())
}

See what I did there?

I'm a little bit late to the party, but would like to note the following.

There are actually two tasks to solve here:

  1. A sequence of instructions should be interrupted the first time a problem occurs.
  2. In case of a problem/interruption a default value should be returned.

Therefore solve the task in two steps:

pub fn compare_paths_for_size(path_a: &str, path_b: &str) -> Ordering {
    let compare = move || {
        let size_a = file_info_size(path_a).ok()?;
        let size_b = file_info_size(path_b).ok()?;
        size_a.partial_cmp(&size_b)
    };

    compare().unwrap_or(Ordering::Equal)
}

Each Result<T> can be converted to an Option<T> by .ok(). Exiting a sequence of computation is done easily with the ?-operator. Finally .unwrap_or(...) is used to set the default value of the computation in case it exited with a None value early.

1 Like

Thanks, that is really clean. I love it!

Yes, I like the way of employing anyhow for clear error report. TBH, In present state, your solution needs further refactoring, but yes, the direction is one I would choose as well. Thanks.

fn get_file_size(path: &str) -> Result<u64> {
    let size = std::fs::metadata(path)
        .with_context(|| format!("Failed to read metadata for {}", path))?
        .len();

    Ok(size)
}
pub fn compare_file_sizes(path_a: &str, path_b: &str) -> Result<Ordering> {
    let size_a = get_file_size(path_a)?;
    let size_b = get_file_size(path_b)?;

    Ok(size_a.cmp(&size_b))
}
4 Likes

I actually find it unhelpful, for the reason that if you get rid of the file_size_info, you immediately loose readability - you are adding additional burden (however small) on the next programmer to read and deduct what this line of code is doing instead of the immediately obvious, file_size. Also code reuse(in case later on you need the same functionality) is suffering etc.

1 Like

Now that we have the error handling in place I like the refactoring again.

1 Like

I'd like to add a way that I think is quite readable while preserving OP's code's semantic by not avoiding match but to match on both values together. This is quite common in other FP languages that support pattern-matching on complex constructs.

Hope it helps...

pub fn compare_paths_for_size(path_a: &str, path_b: &str) -> Ordering {
    match (file_info_size(path_a), file_info_size(path_b)) {
        (Ok(sz_a), Ok(sz_b)) => sz_a.partial_cmp(&sz_b).unwrap_or(Ordering::Equal),
        _ => Ordering::Equal,
    }
}
1 Like

Unlike Haskell, Rust isn't lazy, so both calls to file_info_size would be evaluated before the match this way. That's a different behavior from the original program that avoided the second call in case the first one errors.

3 Likes

Good point on the difference between lazy/non-lazy eval. I don't know about OP's exact requirement but this is definitely worth noting.

pub fn file_info_size(file: &str) -> Result<u64, std::io::Error> {
    let metadata = std::fs::metadata(file)?;
    Ok(metadata.len())
}

pub fn compare_paths_for_size(path_a: &str, path_b: &str) -> Ordering {
    file_info_size(path_a)
    .ok()
    .zip(file_info_size(path_b).ok())
    .and_then(|(size_a, size_b)| size_a.partial_cmp(&size_b))
    .unwrap_or(Ordering::Equal)
}
1 Like

I would have suggested similar implementation myself. I'm thinking one should always return result and use the question mark shorthand to mark places that may cause early return.

That way you basically get all the plus sides of C++ exceptions and none of the cons.

If function takes filenamed or files as input, it definitely shouldn't report "files are equal" if one of the files has IO error or is missing. Only caller can know the correct way to handle such errors.

Also code reuse(in case later on you need the same functionality) is suffering etc.

I totally agree about readability, but this part feels like a solid YAGNI case. It's usually it's unwise to extract functions based on speculations of what might be needed later on. Instead, make the code flexible enough that you can extract that function at the moment you actually need to use it elsewhere.

Typically when I pull code out into its own function it's not because I'm thinking I might need it again later. Mostly it's because it does something that I can give a sensible name to, by making a function/method it gets a meaningful name and so removes the need for so much commenting to explain what is going on. On the road to self documenting code.

The second motivation is just moving some messy detail out of the the main flow of the calling function to aid clarification.

I think arnie's example demonstrates both these motivations.

I would have thought that if I was thinking I would need it again it would have to get pulled all the way out into it's own crate so as to make it reusable.

2 Likes

He already is duplicating code, in the same function...

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.