Currently, I'm using csv
and serde
to process a csv file on windows. The structure of that file is like this:
| A | B | C | <- headers
| data_A | data_B | data_C |
I only use data from A and C with the code look like this:
#[derive(Deserialize)]
struct Record {
#[serde(rename="A")]
field_a: String,
#[serde(rename="C")]
field_c: String,
}
But some rows of B (not the header) contain non-utf8 character, and the program throws an error at runtime saying it won't accept non-utf8 character.
It seems strange to me because I only use A and C. Normally Deserialize
should leave B untouch right? Even deserializing the data from ByteRecord
does not fix the error.
Right now I'm using encoding_rs
to decode the whole file before feeding it to csv
. But I'm still feeling this approach is not the right one. I mean, why do I need to fix something I don't use?