Deserialising multiple types from CSV (with serde)

I am trying to deserialize a bunch of CSV files into a sequence of structs with Serde. Am learning both Rust and Serde with the wonderful tutorial on csv.

The CSV files have 1:1 mapping with the types I am trying to create, e.g. consider 2 files:

*type1.csv*

id,name
1,"X"
2,"Y"

*type2.csv*


id,name
1,"A"
2,"B"

Corresponding to all the types, I have structs:

#[derive(Hash, Eq, PartialEq, Debug, Ord, PartialOrd, Deserialize)]
struct Type1 {
    id: u8,
    name: String,
}
#[derive(Hash, Eq, PartialEq, Debug, Ord, PartialOrd, Deserialize)]
struct Type2 {
    id: u8,
    name: String,
}

I'm using serde, but I find that I am writing the same function over and over again, so wanted to see what would be the idiomatic rust (rustic?) way to do this.

        let file_path = //path to type1.csv
        let mut rdr = csv::Reader::from_path(file_path)?;
        for result in rdr.deserialize() {
            let type1: Type1 = result?;
           // do useful stuff with type1 like add it to a vector
  1. How do I avoid re-writing the same loop for the 2-dozen or so types I need to read and instantiate?
  2. Can I use serde to get a vector of structs rather than using the for loop?

Indeed, you can! In fact, that function can be written in one line. :slight_smile: But here's a complete example:

#[macro_use]
extern crate serde_derive;

use std::io;

use csv;
use serde::de::DeserializeOwned;

#[derive(Hash, Eq, PartialEq, Debug, Ord, PartialOrd, Deserialize)]
struct Type1 {
    id: u8,
    name: String,
}

#[derive(Hash, Eq, PartialEq, Debug, Ord, PartialOrd, Deserialize)]
struct Type2 {
    id: u8,
    name: String,
}

fn parse_csv<D: DeserializeOwned, R: io::Read>(rdr: R) -> csv::Result<Vec<D>> {
    csv::Reader::from_reader(rdr).into_deserialize().collect()
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let data1 = "\
id,name
1,\"X\"
2,\"Y\"
";
    let data2 = "\
id,name
1,\"A\"
2,\"B\"
";

    let records1: Vec<Type1> = parse_csv(data1.as_bytes())?;
    let records2: Vec<Type2> = parse_csv(data2.as_bytes())?;
    
    println!("{:?}", records1);
    println!("{:?}", records2);
    
    Ok(())
}

Playground link: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=5e9fd1e031d217af71de74fd1df60441

3 Likes

many thanks!

Is there a better way that prevents me to writing out each type's parse. E.g. I have a top level struct:

struct World {
    type1: Vec<Type1>,
    type2: Vec<Type2>,
    ...
}

I could always just do the parsing in a single function with every type typed out -- but is there a better pattern to handle such cases?

thanks again!

I don't understand your question, sorry. If one type works for all your CSV files, then just use one type.

Apologies - I was not clear.

Each type is read from a different csv file. I have a lot of csv files to read.

In effect, I will have a function with many lines reading almost the same:

    world.type1 = parse_csv("Type1.csv")?;
    world.type2 = parse_csv("Type2.csv")?;
    ...
    world.type30 = parse_csv("Type30.csv")?;

Where world is a struct of type World

Is there a better option than listing out all 30+ types and the parsing functions like so?

If they are all distinct types, then no. You could probably get away with using a macro if you really did not want to write out all of them. But unless it's an obscene amount, I would just write them all out. Chances are you only have to do it once.

Agree, thanks.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.