Deserialising multiple types from CSV (with serde)

I am trying to deserialize a bunch of CSV files into a sequence of structs with Serde. Am learning both Rust and Serde with the wonderful tutorial on csv.

The CSV files have 1:1 mapping with the types I am trying to create, e.g. consider 2 files:

*type1.csv*

id,name
1,"X"
2,"Y"

*type2.csv*


id,name
1,"A"
2,"B"

Corresponding to all the types, I have structs:

#[derive(Hash, Eq, PartialEq, Debug, Ord, PartialOrd, Deserialize)]
struct Type1 {
    id: u8,
    name: String,
}
#[derive(Hash, Eq, PartialEq, Debug, Ord, PartialOrd, Deserialize)]
struct Type2 {
    id: u8,
    name: String,
}

I'm using serde, but I find that I am writing the same function over and over again, so wanted to see what would be the idiomatic rust (rustic?) way to do this.

        let file_path = //path to type1.csv
        let mut rdr = csv::Reader::from_path(file_path)?;
        for result in rdr.deserialize() {
            let type1: Type1 = result?;
           // do useful stuff with type1 like add it to a vector
  1. How do I avoid re-writing the same loop for the 2-dozen or so types I need to read and instantiate?
  2. Can I use serde to get a vector of structs rather than using the for loop?

Indeed, you can! In fact, that function can be written in one line. :slight_smile: But here's a complete example:

#[macro_use]
extern crate serde_derive;

use std::io;

use csv;
use serde::de::DeserializeOwned;

#[derive(Hash, Eq, PartialEq, Debug, Ord, PartialOrd, Deserialize)]
struct Type1 {
    id: u8,
    name: String,
}

#[derive(Hash, Eq, PartialEq, Debug, Ord, PartialOrd, Deserialize)]
struct Type2 {
    id: u8,
    name: String,
}

fn parse_csv<D: DeserializeOwned, R: io::Read>(rdr: R) -> csv::Result<Vec<D>> {
    csv::Reader::from_reader(rdr).into_deserialize().collect()
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let data1 = "\
id,name
1,\"X\"
2,\"Y\"
";
    let data2 = "\
id,name
1,\"A\"
2,\"B\"
";

    let records1: Vec<Type1> = parse_csv(data1.as_bytes())?;
    let records2: Vec<Type2> = parse_csv(data2.as_bytes())?;
    
    println!("{:?}", records1);
    println!("{:?}", records2);
    
    Ok(())
}

Playground link: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=5e9fd1e031d217af71de74fd1df60441

3 Likes

many thanks!

Is there a better way that prevents me to writing out each type's parse. E.g. I have a top level struct:

struct World {
    type1: Vec<Type1>,
    type2: Vec<Type2>,
    ...
}

I could always just do the parsing in a single function with every type typed out -- but is there a better pattern to handle such cases?

thanks again!

I don't understand your question, sorry. If one type works for all your CSV files, then just use one type.

Apologies - I was not clear.

Each type is read from a different csv file. I have a lot of csv files to read.

In effect, I will have a function with many lines reading almost the same:

    world.type1 = parse_csv("Type1.csv")?;
    world.type2 = parse_csv("Type2.csv")?;
    ...
    world.type30 = parse_csv("Type30.csv")?;

Where world is a struct of type World

Is there a better option than listing out all 30+ types and the parsing functions like so?

If they are all distinct types, then no. You could probably get away with using a macro if you really did not want to write out all of them. But unless it's an obscene amount, I would just write them all out. Chances are you only have to do it once.

Agree, thanks.