Refactoring: parsing mpstat with 2 Enums and 1 unsafe

Hello, Rustaceans!

New to this forum, so this is my welcome message.

I have been following Rust a bit before the 1.0 beta (I think around 2014), but in these years I have been mostly reading blogs, libraries and the Book now and then. The first time I tried Rust (2015) I needed like 5 hours to achieve something done in Bash in 20 secs, and I "suffered" quite a lot. I have given a try again a year ago, and these days, and I feel much better. It is a pitty that I cannot use Rust in my work day, and just keep it as a "hobby", maybe a few hours per week, if lucky. Still, I have tried an exercise yesterday, trying to port a code from JavaScript to Rust. I achieve it in 1-2 hours, but I am not happy with the result (I would give myself a 7/10).

I am asking about a proper refactor, the way a more advanced Rustacean would produce this code.

In this code I discarded the mpstat launch, and I just use the output string.

The code in the playground: playground

enum Header {
    Cpu = 0,
    Usr,
    Sys,
    Iowait,
    Idle,
    Other,
}

#[derive(Debug)]
enum HeaderValue<'a> {
    Cpu(&'a str),
    Usr(f32),
    Sys(f32),
    Iowait(f32),
    Idle(f32),
    Other(f32),
}

fn main() {
    let str = "
Linux 4.9.68-1-lts (xxx)    13/03/18        _x86_64_        (8 CPU)

Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
Average:     all    2.80    0.00    1.01    0.13    0.00    0.16    0.00    0.00    0.00   95.92
Average:       0    6.78    0.00    1.26    0.75    0.00    0.25    0.00    0.08    0.00   90.95
Average:       1    1.25    0.00    0.75    0.25    0.10    0.75    0.00    0.03    0.00   97.00
Average:       2    1.75    0.00    1.75    0.00    0.00    0.25    0.00    0.00    0.00   96.25
Average:       3    1.75    0.00    0.75    0.00    0.00    0.00    0.00    0.00    0.00   97.51
Average:       4    7.07    0.00    0.51    0.00    0.00    0.00    0.00    0.00    0.00   92.42
Average:       5    1.26    0.00    0.50    0.00    0.20    0.00    0.00    0.00    0.00   98.24
Average:       6    1.77    0.00    1.52    0.00    0.00    0.00    0.00    0.00    0.00   96.71
Average:       7    0.76    0.00    1.01    0.00    0.00    0.00    0.00    0.00    0.00   98.23
";

    println!("{}", str);

    let mut headers_parsed = false;

    let mut headers_idx: Vec<Header> = vec![];

    let mut cpus_values: Vec<Vec<HeaderValue>> = vec![];

    for line in str.lines() {
        if line.starts_with("Average") {
            let cols: Vec<&str> = line.split(' ').filter(|c| *c != "").collect();
            if headers_parsed {
                let mut all_values: Vec<HeaderValue> = vec![
                    HeaderValue::Cpu(""),
                    HeaderValue::Usr(0.0f32),
                    HeaderValue::Sys(0.0f32),
                    HeaderValue::Iowait(0.0f32),
                    HeaderValue::Idle(0.0f32),
                    HeaderValue::Other(0.0f32),
                ];

                for (idx, val) in cols.iter().skip(1).enumerate() {
                    let header = &headers_idx[idx];
                    let fval = match val.parse::<f32>() {
                        // unneeded for Cpu
                        Ok(v) => v,
                        _ => 0.0f32,
                    };
                    match *header {
                        Header::Cpu => {
                            all_values[Header::Cpu as usize] = HeaderValue::Cpu(val);
                        }
                        Header::Usr => {
                            all_values[Header::Usr as usize] = HeaderValue::Usr(fval);
                        }
                        Header::Sys => {
                            all_values[Header::Sys as usize] = HeaderValue::Sys(fval);
                        }
                        Header::Iowait => {
                            all_values[Header::Iowait as usize] = HeaderValue::Iowait(fval);
                        }
                        Header::Idle => {
                            all_values[Header::Idle as usize] = HeaderValue::Idle(fval);
                        }
                        Header::Other => {
                            if let HeaderValue::Other(prev_fval) =
                                all_values[Header::Other as usize]
                            {
                                all_values[Header::Other as usize] =
                                    HeaderValue::Other(fval + prev_fval);
                            }
                        }
                    };
                }
                cpus_values.push(all_values);
            } else {
                headers_idx.reserve(cols.len());
                unsafe {
                    headers_idx.set_len(cols.len());
                }
                for (idx, head) in cols.iter().skip(1).enumerate() {
                    let header = match head.to_lowercase().as_ref() {
                        "cpu" => Header::Cpu,
                        "%usr" => Header::Usr,
                        "%sys" => Header::Sys,
                        "%iowait" => Header::Iowait,
                        "%idle" => Header::Idle,
                        _ => Header::Other,
                    };
                    headers_idx[idx] = header;
                }

                headers_parsed = true;
            }
        }
    }

    for values in cpus_values.iter() {
        println!("{:?}", values);
    }
}

The output:

[Cpu("all"), Usr(2.8), Sys(1.01), Iowait(0.13), Idle(95.92), Other(0.16)]
[Cpu("0"), Usr(6.78), Sys(1.26), Iowait(0.75), Idle(90.95), Other(0.32999998)]
[Cpu("1"), Usr(1.25), Sys(0.75), Iowait(0.25), Idle(97.0), Other(0.88)]
[Cpu("2"), Usr(1.75), Sys(1.75), Iowait(0.0), Idle(96.25), Other(0.25)]
[Cpu("3"), Usr(1.75), Sys(0.75), Iowait(0.0), Idle(97.51), Other(0.0)]
[Cpu("4"), Usr(7.07), Sys(0.51), Iowait(0.0), Idle(92.42), Other(0.0)]
[Cpu("5"), Usr(1.26), Sys(0.5), Iowait(0.0), Idle(98.24), Other(0.2)]
[Cpu("6"), Usr(1.77), Sys(1.52), Iowait(0.0), Idle(96.71), Other(0.0)]
[Cpu("7"), Usr(0.76), Sys(1.01), Iowait(0.0), Idle(98.23), Other(0.0)]

The algorithm:
It parses the lines starting with "Average:". Each line is read by columns (spaces as delimiter). The CPU column is a string, the rest are f32. The most interesting columns are saved: usr, sys, iowait and idle. The rest f32 are added to a Other "virtual" column.

First I thought about implementing it with HashMaps, coming from JavaScript, but seems quite unnatural. Then, I used Enums, but I had some problems regarding match and different data inside every enum (&str vs f32). Regarding HeaderValue::Other I wanted to use sum_other_header_value + new_other_header_value but it seemed a bit hard how to implement the Add trait for every HeaderValue, and finally used HeaderValue::Other(prev_header_value + new_other_header_value). Also, I don't like that the parse function is used also with Cpu, because it is not necessary, but having a previous if for Cpu seemed wrong regarding maintainability.

As you can see, this code works, but I would like to improve it: any refactor/maintainability and performance tricks? Still, If you want to give any advice and you have to decide between those two aspects, I prefer the maintainability (the cleaner and simpler the code, the better).

Have a good day.

I would make just a few changes.

First, I do not like the initialization of all_values. I prefer to have

let mut all_values: Vec<HeaderValue> = vec![HeaderValue::Uninitialized;6];

This requires adding the Unitialized variant to the enum, to derive Clone on it and a little condition to initialize the Other variant.

Second, it is the issue of the unsafe block, which is completely unjustified. You can just push the headers.

                headers_idx.reserve(cols.len());
                for head in cols.iter().skip(1) {
                    let header = match head.to_lowercase().as_ref() {
                        "cpu" => Header::Cpu,
                        "%usr" => Header::Usr,
                        "%sys" => Header::Sys,
                        "%iowait" => Header::Iowait,
                        "%idle" => Header::Idle,
                        _ => Header::Other,
                    };
                    headers_idx.push(header);
                }

Link to updated playgorund Rust Playground

Hi, @nakacristo

Yes, in this case both modifications are Ok ;), although they are just little ones. I was expecting another approach (better algorithm/ergonomics usage) that would be for an experienced user, the "Rust way", in case there is one single common way.

Thanks

I have given another look at the code. We can remove some redundancy by moving all the update of the values into a function of the Header.

impl Header{
    fn update_value<'a>(&self, hv: &mut HeaderValue<'a>, val:&'a str){
        let fval = match val.parse::<f32>() {
            // unneeded for Cpu
            Ok(v) => v,
            _ => 0.0f32,
        };
        *hv= match *self
        {
            Header::Cpu => HeaderValue::Cpu(val),
            Header::Usr => HeaderValue::Usr(fval),
            Header::Sys => HeaderValue::Sys(fval),
            Header::Iowait => HeaderValue::Iowait(fval),
            Header::Idle => HeaderValue::Idle(fval),
            Header::Other => {
                if let HeaderValue::Other(prev_fval) = *hv
                {
                    HeaderValue::Other(fval + prev_fval)
                }
                else
                {
                    HeaderValue::Other(fval)
                }
            }
        }
    }
}

New playground: Rust Playground

I think the general approach is the correct one. Perhaps someone more veteran with Rust may say if it fits into Rust's spirit.

Why do you need that .clone()? It seems a bit fishy.

I think the ideal solution should have only one Enum, probably mixing both.

If you do not like having a HeaderValue because of the redundancy with Header you may define a simpler enum.

#[derive(Debug,Clone)]
enum Value<'a>{
    Uninitialized,
    Str(&'a str),
    Float(f32),
}

You know the header of the value from the index, so there is no loss of information.

playground: Rust Playground

I do not usually worry about cloning a basic enum, since it is just an integer.

Now it is better. Although I would rather include the method inside the loop.

Done :wink: