ELF processing in Rust

A friend posted the following pipeline to find all the firmware files for a set of Linux kernel modules:

find /lib/modules/4.9.0-2-amd64/ -name '*.ko' | xargs modinfo -F firmware > /dev/null

As an exercise in systems programming, I was curious how easily it could be recreated in Rust, and as a bonus, run in parallel on multiple modules using rayon. After doing so, I thought others might find it interesting as well.

(Note that the pipeline didn't have any particular performance issues, and already ran in a fraction of a second on a warm cache, making all of this unnecessary for optimization purposes; this was purely an exercise.)

Rust doesn't currently have bindings to the libkmod library, but we can recreate this particular bit of its functionality easily. Linux kernel modules store module information in the ELF section .modinfo, as a series of null-terminated strings, each of the form key=value. Firmware files (defined in kernel modules with the MODULE_FIRMWARE macro) appear as strings with the key "firmware" and a filename as the value.

As a first pass, I wrote the following code to parse kernel modules (sequentially):

extern crate memmap;
extern crate walkdir;
extern crate xmas_elf;
extern crate zero;

use std::ffi::OsStr;

const FIRMWARE: &'static str = "firmware=";

fn main() {
    let ext = Some(OsStr::new("ko"));
    for entry in walkdir::WalkDir::new("/lib/modules/4.9.0-2-amd64/") {
        let entry = entry.unwrap();
        if entry.path().extension() == ext {
            let m = memmap::Mmap::open_path(entry.path(), memmap::Protection::Read).unwrap();
            let bytes = unsafe { m.as_slice() };
            let elf = xmas_elf::ElfFile::new(bytes);
            let modinfo = elf.find_section_by_name(".modinfo").unwrap();
            let data = modinfo.raw_data(&elf);
            for s in zero::read_strs_to_null(data) {
                if s.starts_with(FIRMWARE) {
                    println!("{}", &s[FIRMWARE.len()..]);
                }
            }
        }
    }
}

(For this exercise, I intentionally hard-coded the module path here rather than implementing argument processing. Production code would also factor the ELF processing out into a function, and use proper error handling instead of .unwrap().)

This worked, and already matched the performance of the pipeline invoking the C modinfo implementation.

Next, I tried to use rayon to parallelize this. WalkDir doesn't implement rayon's parallel iterator traits, so calling .into_par_iter() on it directly doesn't work. So, I tried collecting the directory walk results into a vector and processing that in parallel:

extern crate memmap;
extern crate rayon;
extern crate walkdir;
extern crate xmas_elf;
extern crate zero;

use std::ffi::OsStr;

use rayon::prelude::*;

const FIRMWARE: &'static str = "firmware=";

fn main() {
    let ext = Some(OsStr::new("ko"));
    let vec = walkdir::WalkDir::new("/lib/modules/4.9.0-2-amd64/").into_iter().collect::<Result<Vec<_>, _>>().unwrap();
    vec.into_par_iter().for_each(|entry| {
        if entry.path().extension() == ext {
            let m = memmap::Mmap::open_path(entry.path(), memmap::Protection::Read).unwrap();
            let bytes = unsafe { m.as_slice() };
            let elf = xmas_elf::ElfFile::new(bytes);
            let modinfo = elf.find_section_by_name(".modinfo").unwrap();
            let data = modinfo.raw_data(&elf);
            for s in zero::read_strs_to_null(data) {
                if s.starts_with(FIRMWARE) {
                    println!("{}", &s[FIRMWARE.len()..]);
                }
            }
        }
    });
}

This also worked, but didn't produce any speedup.

I also wanted to try more parallelism, and in particular to run the directory walk and the processing in parallel. I dropped by the rayon Gitter channel and asked about how to handle iterators that don't support the parallel iterator traits. @nikomatsakis suggested that he had plans to implement a general-purpose iter-to-par-iter adapter that would allow using rayon on any iterator. But in the meantime, he explained how to manually spawn work in rayon's thread pool using the scope API, leading to the last version:

extern crate memmap;
extern crate rayon;
extern crate walkdir;
extern crate xmas_elf;
extern crate zero;

use std::ffi::OsStr;

const FIRMWARE: &'static str = "firmware=";

fn main() {
    let ext = Some(OsStr::new("ko"));
    rayon::scope(|s| {
        for entry in walkdir::WalkDir::new("/lib/modules/4.9.0-2-amd64/") {
            s.spawn(move |_| {
                let entry = entry.unwrap();
                if entry.path().extension() == ext {
                    let m = memmap::Mmap::open_path(entry.path(), memmap::Protection::Read).unwrap();
                    let bytes = unsafe { m.as_slice() };
                    let elf = xmas_elf::ElfFile::new(bytes);
                    let modinfo = elf.find_section_by_name(".modinfo").unwrap();
                    let data = modinfo.raw_data(&elf);
                    for s in zero::read_strs_to_null(data) {
                        if s.starts_with(FIRMWARE) {
                            println!("{}", &s[FIRMWARE.len()..]);
                        }
                    }
                }
            });
        }
    });
}

With the additional parallelism, this ran consistently 10-20% faster on my machine. (Profiling suggests that the directory walk takes the majority of the time, and the printing takes the rest, so any further speedup would likely involve making the directory walk itself parallel and eliminating the printing in favor of directly performing any subsequent processing.)

14 Likes

Nice! On the walkdir issue you filed, I forgot to mention that ripgrep's parallel directory iterator is actually available in the ignore crate. It's a bit heavy handed though, and includes a lot of options for filtering, but it looks like you could almost just drop it in with defaults and see if it gets a speed up.

1 Like

Yeah, that produces the same performance as the last version, but much more simply. Some feedback on the ignore crate:

It'd be nice to have a shorthand for walk.ignore(false).git_global(false).git_ignore(false).git_exclude(false) (and all future such options).

The sample call to run in the documentation for build_parallel doesn't include the requirement to box the thread-specific closure.

Also, I didn't see how using either the "types" or "overrides" mechanism to set a matching pattern, like *.ko; did I miss something there, or do they not have any constructors that allow setting those?

They do. The constructors are builders. There are examples for types. And there is an OverrideBuilder.