Skipping garbage at the top of a CSV file

I am using csv to read CSV files. So far everything was fine, but I've just encountered a file that has some notes before the header line. Some notes that 'd like to skip.

I have been fighting this for a while but could not yet figure out how to skip 3 lines and then use the rest of the file for reading.

    let filepath = "Connections.csv";
    let mut records: Vec<Connection> = vec![];
    let fh = File::open(&filepath)?;
    let br = std::io::BufReader::new(&fh);
    let lines = br.lines();
    let _ = lines.skip(3);

    let mut rdr = csv::Reader::from_reader(fh);
    for result in rdr.deserialize() {
        let record: Connection = result?;
        records.push(record);
    }

In this example the CSV reader starts from the beginning of the file (as probably expected, I am only showing this so you can see I did fight with it.)

    let br = std::io::BufReader::new(&fh);
    let lines = br.lines();
    let _ = lines.skip(3);

These lines does nothing to fh. You need to pass the modified cursor to CsvReader.

Quick hack version:

    let mut buf = BufReader::new(f);
    for _ in 0..3 {
        buf.skip_until(b'\n').expect("header garbage");
    }
    let mut rdr = csv::Reader::from_reader(buf);
1 Like

Would it work if you use std::io::Cursor instead?

let filepath = "Connections.csv";
let mut records: Vec<Connection> = vec![];
let fh = File::open(&filepath)?;
let cursor = std::io::Cursor::new(&fh);
let lines = cursor.lines();
let _ = lines.skip(3);

let mut rdr = csv::Reader::from_reader(cursor);
for result in rdr.deserialize() {
    let record: Connection = result?;
    records.push(record);
}

You can use BufRead::skip_until to skip over data until you reach a delimiter. Since that delimiter can only be one byte, you skip until a newline character \n 3 times. (skip_until also skips the \n terminator).

let filepath = "Connections.csv";
let mut records: Vec<Connection> = vec![];
let fh = File::open(filepath)?;
let mut br = std::io::BufReader::new(fh);
// This is the new part!
// (You can also just write br.skip_until(b'\n') 3 times instead of a loop)
for _ in 0..3 {
    br.skip_until(b'\n');
}
//                              NOT fh vv
let mut rdr = csv::Reader::from_reader(br);
for result in rdr.deserialize() {
    let record: Connection = result?;
    records.push(record);
}

Your code example didn’t work because .skipping on an iterator doesn’t actually do anything except alter the future state of that iterator. In fact, you get an unused_must_use warning, which you suppressed with the let _ = lines.skip(3).

If you have a specific byte offset into the file to skip to (say, you know there’s always 80 bytes of junk before the CSV data begins), use Seek on your File.

As another note on your code, your final loop of deserializing and pushing into a Vec can be replaced with Iterator::collect:

// ...
let mut rdr = csv::Reader::from_reader(br);
let records = rdr.deserialize::<Connection>()
    .collect::<Result<Vec<_>, _>>()?;

This uses impl FromIterator<Value> for Result<Collection<Value>, Error>, which implements collecting the iterator into a collection[1], returning early with an Error if one is returned from the iterator.


  1. in your case, Vec ↩︎

1 Like