How to DRY that code?

Hi!
I would like to know how to refactor my code in order to DRY.
I am working on parsing EDF file format and converting bytes from the file to string.

// HEADER RECORD
// 8 ascii : version of this data format (0)
// 80 ascii : local patient identification
// 80 ascii : local recording identification
// 8 ascii : startdate of recording (dd.mm.yy)
// 8 ascii : starttime of recording (hh.mm.ss)
// 8 ascii : number of bytes in header record
// 44 ascii : reserved
// 8 ascii : number of data records (-1 if unknown)
// 8 ascii : duration of a data record, in seconds
// 4 ascii : number of signals (ns) in data record
// ns * 16 ascii : ns * label (e.g. EEG FpzCz or Body temp)
// ns * 80 ascii : ns * transducer type (e.g. AgAgCl electrode)
// ns * 8 ascii : ns * physical dimension (e.g. uV or degreeC)
// ns * 8 ascii : ns * physical minimum (e.g. -500 or 34)
// ns * 8 ascii : ns * physical maximum (e.g. 500 or 40)
// ns * 8 ascii : ns * digital minimum (e.g. -2048)
// ns * 8 ascii : ns * digital maximum (e.g. 2047)
// ns * 80 ascii : ns * prefiltering (e.g. HP:0.1Hz LP:75Hz)
// ns * 8 ascii : ns * nr of samples in each data record
// ns * 32 ascii : ns * reserved

use chrono::prelude::*;
use std::fs::File;
use std::io::{self, Read};
use std::path::Path;

pub struct EDF {
    header: Header,
    data: Vec<Data>,
}

struct Header {
    ver: u32,
    patient_ID: String,
    record_ID: String,
    start_date: DateTime<Local>,
    bytes: u32,
    reserved: u32,
    records: u32,
    duration: f64,
    ns: u32, // number of signals
    label: String,
    transducer: String,
    units: String,
    physical_min: String,
    physical_max: String,
    digital_min: String,
    digital_max: String,
    prefilter: String,
    samples: String,
    frequency: String,
}

struct Data {}

impl EDF {
    pub fn read(path: &Path) -> Result<(), io::Error> {
        let mut records = File::open(path)?;
        dbg!(&records);

        let mut ver = vec![0; 8];
        records.read(&mut ver);
        let ver = String::from_utf8(ver).unwrap();
        println!("ver: {}", ver);

        let mut patient_ID = vec![0; 80];
        records.read(&mut patient_ID);
        let patient_ID = String::from_utf8(patient_ID).unwrap();
        println!("patient_ID: {}", patient_ID);

        let mut ...= vec![0; 80];
        records.read(&mut ...);
        let ...= String::from_utf8(...).unwrap();
        println!("...: {}", ...);

        ...

        Ok(())
    }
}

What I am trying to 'DRY' is that part:

        let mut ...= vec![0; 80];
        records.read(&mut ...);
        let ...= String::from_utf8(...).unwrap();
        println!("...: {}", ...);

Thank's in advance!

Noah.

just write a function with the required arguments or perhaps a closure if the compiler does not complain...

1 Like

For example

fn read_fixed_width_string<R: Read>(reader: &mut R, width: usize) -> Result<String, io::Error> {
    let mut buffer = vec![0; width];
    reader.read_exact(&mut buffer)?;
    String::from_utf8(buffer)
        .map_err(|e| io::Error::new(io::ErrorKind::InvalidData, e))
}
4 Likes

Using read_exact instead of read as in @quinedot's example is important, you weren't handling the return value of read, which is allowed to not fill the whole buffer.

You also might want to consider wrapping your file in a BufReader since you're doing a bunch of relatively small reads. It can make a big difference in how fast your program runs if you do a bunch of small IO

4 Likes

What you want is basically a parser for a custom data format. Writing correct parsers is actually tricky business. I suggest you look at something well-developed, like nom. It would allow you to define the grammar of your format separately from the parsing stage itself.

Some more general recommendations:

  • Don't forget to handle your errors. There are many records.read(&mut patient_ID); calls in your example code. Now, perhaps you have just written some example code for the question and made an easy mistake, but if you didn't and it is an edited copy of actual code, than ignoring errors can cause many problems in the future. Don't forget to run cargo clippy on your code before committing it, and try to fix all warnings and errors that Clippy produces (it is occasionally wrong, but usually right).

  • Separate the parsing and processing stages of the process. Try to declare a separate structure, which would hold all parsed data from the record. Do the parse, return an error if any kind of error occurred during the parsing (due to I/O errors or a format error), and only then process the returned data. Depending on your use case, you may consider adding an option to recover the parsing even if the data is corrupted (which unfortunately happens too often in real life).

  • You can get some simple factoring of your code using the read_fixed_width_string function from @quinedot 's answer. If there is more structure in your data format, you may consider declaring separate types for different fields of the record, and defining the parse functions for each individual type.

  • For primitive integers, consider using the byteorder crate, which provides convenient traits (ReadBytesExt/WriteBytesExt) for reading and writing integers.

4 Likes

Do you have an example of how you can the file in a BuffReader? Thanks in advance.

1 Like

You should be able to just do

BufReader::new(File::open(path)?)

The Read trait is implemented on it too so you'll be able to do the same operations on it.

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.