Serde, csv: empty fields are the string "null"

I've got a csv file that I'm trying to parse, and I want to use serde support in the csv crate, but I can't figure out how to handle nulls.

Some example data that I'd like to parse

1234,3.0,ABC
231,null,A1B

I want to deserialize this into

struct MyStruct {
    id: i32,
    value: Option<f32>,
    code: String
}

I'd like something generic, so I thought I'd try making a type like

pub struct Nullable<T>(Option<T>);

and implement Deserialize for it.

The format is ambiguous if null is a valid value for T, but I don't control the input format, so I'm ignoring this.
What I want to do is to try the deserializer with visit_str and check the string == "null"), and then if this fails try the inner deserializer, but I can't see how to do this. Does anyone have any suggestions?

That is what Option does. Just wrap the values that you want to be nullable in Option

1 Like

I believe that what option does depends on the Deserializer, and in the case of csv, it looks for the empty string.

Just using Option<T> isn't enough. This is explained in the tutorial. In particular, Option distinguishes between absence and presence. The only standard way of expressing absence in CSV data is by leaving the field empty. Values like the string null are merely conventions and cannot always be relied on.

This is why csv::invalid_option exists. Namely, if you have a numeric field, and some of its values are null, then tagging that field with csv::invalid_option will cause null to get translated to None instead of producing an error.

However, csv::invalid_option doesn't work if your field is a normal string value, since null is a valid String. It also doesn't work if you want to distinguish between missing values and invalid values. So in those cases, yes, you have do an additional check yourself. Personally, the way I'd do this is in the same style as csv::invalid_option. Here's a full example:

use serde::Deserialize;

#[derive(Clone, Debug, Deserialize)]
struct Row {
    city: String,
    country: String,
    #[serde(deserialize_with = "nullable")]
    mayor: Option<String>,
    #[serde(deserialize_with = "nullable")]
    pop: Option<u64>,
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let data = "\
city,country,mayor,pop
Marlborough,United States,Arthur Vigeant,39825
Boston,United States,null,null
";
    let mut rdr = csv::Reader::from_reader(data.as_bytes());
    for result in rdr.deserialize() {
        let row: Row = result?;
        eprintln!("{:?}", row);
    }
    Ok(())
}

fn nullable<'de, D, T, E>(de: D) -> Result<Option<T>, D::Error>
where
    D: serde::Deserializer<'de>,
    Option<T>: serde::Deserialize<'de>,
    T: std::str::FromStr<Err=E>,
    E: std::error::Error,
{
    use serde::de::Error;

    let val = String::deserialize(de)?;
    if val.is_empty() || val == "null" {
        Ok(None)
    } else {
        val.parse().map(Some).map_err(|e: E| D::Error::custom(e.to_string()))
    }
}

And the output is:

$ cargo run
   Compiling csv-nullable v0.1.0 (/tmp/csv-nullable)
    Finished dev [unoptimized + debuginfo] target(s) in 0.57s
     Running `target/debug/csv-nullable`
Row { city: "Marlborough", country: "United States", mayor: Some("Arthur Vigeant"), pop: Some(39825) }
Row { city: "Boston", country: "United States", mayor: None, pop: None }

With that said, this is needlessly hard (it didn't take me long to come up with the solution, but it took me about 30 minutes to confirm that I couldn't do any better), and notice that the above solution relies on FromStr to parse the contents of a non-null field into the expected type. This is probably fine in practice for most cases, but may not be appropriate in every case. So I filed this issue to make defining "null" in your CSV data configurable.

10 Likes

This is the best answer I've ever had to a question!!

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.