Comparing file extensions without type-conversion hell


#1

So I’m working on a program that has to scan a directory for HTML files. This is what I have so far:

fn scan_dir(src:&str) -> std::io::Result<()>{
    for entry in try!(fs::walk_dir(src)){
        let entry = try!(entry);
        if !entry.path().is_dir(){
            let extension = &entry.path().extension().unwrap().to_os_string().into_string().unwrap();
            if extension == "html"{
                println!("This is an HTML file");
            }
        }
    }
    Ok(())
}

It definitely works, but it feels like a Rube Goldberg device. First I convert the PathExt to a PathBuf, get its extension as an Option<&OsStr>, unwrap that &OsStr, convert the &OsStr into an OsString, convert the OsString into an Option<String>, and then unwrap it again. Only then do I have a String which implements PartialEq so I can compare it to a string literal.

Now, I’m new to Rust, so maybe this amount of type conversion is par for the course, but I feel like I’m doing something wrong. Is there a better way to do file extension comparisons?


#2

Since extension returns a Option<&OsStr> and OsStr has PartialEq impls for str, which means you can use == directly. No conversion needed.

I might write it like this:

fn scan_dir(src:&str) -> std::io::Result<()>{
    for entry in try!(fs::walk_dir(src)) {
        let entry = try!(entry);
        if !entry.path().is_dir()
           && entry.path().extension().map(|s| s == "html").unwrap_or(false) {
            println!("This is an HTML file");
        }
    }
    Ok(())
}

The trick is to do the equality comparison with map and use unwrap_or to convert the absence of an extension to “not equal to html.”


#3

Here’s another way to write it that I think I like better:

fn scan_dir(src:&str) -> std::io::Result<()>{
    fn is_html(e: &fs::DirEntry) -> bool {
        let p = e.path();
        p.is_file() && p.extension().map(|s| s == "html").unwrap_or(false)
    }

    for entry in try!(fs::walk_dir(src)).filter_map(|e| e.ok()).filter(is_html) {
        println!("This is an HTML file: {:?}", entry.path());
    }
    Ok(())
}

#4

Thanks! I’ve expanded that to search for both HTML and JS files:

struct FileRegistry {
    html: Vec<std::path::PathBuf>,
    js: Vec<std::path::PathBuf>
}

fn scan_dir(src:&str) -> std::io::Result<FileRegistry>{
    fn is_file_type(e: &fs::DirEntry, ext: &str) -> bool{
        let p = e.path();
        p.is_file() && p.extension().map(|s| s == ext).unwrap_or(false)
    }
    fn is_js_html(e: &fs::DirEntry) -> bool{
        is_file_type(e, "html") || is_file_type(e, "js")
    }

    let mut html_files: Vec<std::path::PathBuf> = Vec::new();
    let mut js_files: Vec<std::path::PathBuf> = Vec::new();
    for entry in try!(fs::walk_dir(src)).filter_map(|e| e.ok()).filter(is_js_html){
        if is_file_type(&entry,"html"){
            html_files.push(entry.path());
        } else if is_file_type(&entry,"js"){
            js_files.push(entry.path());
        }
    }
    Ok(FileRegistry{
        html: html_files,
        js: js_files
    })
}

That could probably be simplified more into a single function that will either push the file into the correct vector or return false. Might even be able to make the whole thing extension agnostic and instead group everything into a new vector for each file extension, but the project I’m working on is only looking for those two.