Most efficient way to filter Lines<BufReader<File>> based on multiple criteria

I'm working on a mailcap parsing library as my first official Rust project.

I'm stuck on an issue that I can't seem to find what should be an ergonomic solution. After reading the file, I need to filter out any comment or empty lines (before the next step which would be parsing the relevant lines). The only way I was able to achieve this is by chaining 3 filter() methods on the iterator. It works perfectly fine but calling 3 filters doesn't seem like I'm doing it correctly.

I'm sure there's a better way of doing this. I'm most fluent in Python, and the example in that for what I'm trying to achieve would be something like this.

correct_lines: list[str] = []
for line in lines:
    try:
        if line[0] not in ("#", "\n"):
            correct_lines.append(line)
    except IndexError:
        pass

Below is a snippet of the relevant lines for Rust. If this would be considered good practice in Rust, great! If not, I'd like to learn what would be a better way of parsing to leave only the desired lines.

use std::fs::File;
use std::io::{BufRead, BufReader};
use std::path::PathBuf;

#[allow(dead_code)]
fn get_mailcap_lines(filepath: &PathBuf) -> Result<Vec<String>, &'static str> {
    let file = BufReader::new(File::open(filepath).expect("Cannot open file."));
    let correct_lines: Vec<String> = file
        .lines()
        .map(|i| i.unwrap())
        .filter(|i| !i.starts_with("#"))
        .filter(|i| !i.starts_with("\n"))
        .filter(|i| !i.is_empty())
        .collect();

    if correct_lines.is_empty() {
        Err("No correct lines.")
    } else {
        Ok(correct_lines)
    }
}

(Playground)

Errors:

   Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 3.02s

I actually think your code is pretty easily readable as-is, and it should compile down to the same as this loop, so it will be just as fast:

let mut correct_lines = Vec::new();
for i in file.lines() {
    let i = i.unwrap();
    if i.starts_with("#") { continue; }
    if i.starts_with("\n") { continue; }
    if i.is_empty() { continue; }
    correct_lines.push(i);
}
1 Like

starts_with doesn't necessarily take a &str. See here the Pattern API.
You could do something like :

fn get_mailcap_lines(filepath: &Path) -> std::io::Result<Vec<String>> {
    let file = BufReader::new(File::open(filepath)?);
    let mut correct_lines = Vec::new();
    for line in file.lines() {
        let line = line?;
        if line.is_empty() { continue; }
        // here starts_with takes an array of char
        if line.starts_with(['#', '\n']) { continue; }
        correct_lines.push(line);
    }
    // I'm leaving handling the empty case to the caller, but you can do it here too
    Ok(correct_lines)
}

I agree with alice this is fine in Rust, the 3 filters are easily understood by a human reader and probably well handled by the compiler.

Even in python I might do something different, that a line be empty is not "exceptional" to me :
correct_lines = [
    line
    for line in lines
    if len(line) != 0 and line[0] not in ("#", "\n")
]
1 Like

Thank you both, good to know I was on the right path. Also that starts_with can take an array, for some reason I was stuck on it only accepting &str.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.