Just using Option<T>
isn't enough. This is explained in the tutorial. In particular, Option
distinguishes between absence and presence. The only standard way of expressing absence in CSV data is by leaving the field empty. Values like the string null
are merely conventions and cannot always be relied on.
This is why csv::invalid_option
exists. Namely, if you have a numeric field, and some of its values are null
, then tagging that field with csv::invalid_option
will cause null
to get translated to None
instead of producing an error.
However, csv::invalid_option
doesn't work if your field is a normal string value, since null
is a valid String
. It also doesn't work if you want to distinguish between missing values and invalid values. So in those cases, yes, you have do an additional check yourself. Personally, the way I'd do this is in the same style as csv::invalid_option
. Here's a full example:
use serde::Deserialize;
#[derive(Clone, Debug, Deserialize)]
struct Row {
city: String,
country: String,
#[serde(deserialize_with = "nullable")]
mayor: Option<String>,
#[serde(deserialize_with = "nullable")]
pop: Option<u64>,
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let data = "\
city,country,mayor,pop
Marlborough,United States,Arthur Vigeant,39825
Boston,United States,null,null
";
let mut rdr = csv::Reader::from_reader(data.as_bytes());
for result in rdr.deserialize() {
let row: Row = result?;
eprintln!("{:?}", row);
}
Ok(())
}
fn nullable<'de, D, T, E>(de: D) -> Result<Option<T>, D::Error>
where
D: serde::Deserializer<'de>,
Option<T>: serde::Deserialize<'de>,
T: std::str::FromStr<Err=E>,
E: std::error::Error,
{
use serde::de::Error;
let val = String::deserialize(de)?;
if val.is_empty() || val == "null" {
Ok(None)
} else {
val.parse().map(Some).map_err(|e: E| D::Error::custom(e.to_string()))
}
}
And the output is:
$ cargo run
Compiling csv-nullable v0.1.0 (/tmp/csv-nullable)
Finished dev [unoptimized + debuginfo] target(s) in 0.57s
Running `target/debug/csv-nullable`
Row { city: "Marlborough", country: "United States", mayor: Some("Arthur Vigeant"), pop: Some(39825) }
Row { city: "Boston", country: "United States", mayor: None, pop: None }
With that said, this is needlessly hard (it didn't take me long to come up with the solution, but it took me about 30 minutes to confirm that I couldn't do any better), and notice that the above solution relies on FromStr
to parse the contents of a non-null field into the expected type. This is probably fine in practice for most cases, but may not be appropriate in every case. So I filed this issue to make defining "null" in your CSV data configurable.