I’m using the CSV crate to parse a bunch of log formats conveniently into structs with
serde_derive. It works great, and is super convenient. But, in a small number of records (that I’ve just been ignoring until now), there is non-UTF8 data in one of the fields, and of course they fail to parse into a
Looking at the docs, I can see easily how to use the
ByteRecord struct for lower-level handling, but there’s also the advice: If you are using the Serde (de)serialization APIs, then you probably never need to interact with a
ByteRecord or a
StringRecord . That sounds great, but I can’t figure out how to get the non-UTF8 data into a struct field - ie, what type to make the field.
- I thought I’d use a Vec, but that runs afoul of the (slightly odd, but well described) CSV field flattening https://docs.rs/csv/1.0.2/csv/struct.Reader.html#rules
- I thought I’d use an
OsString, but that wants to parse “Unix” or “Windows” enum variants, so it’s not quite as simple as the brief description of that type suggests.
- I thought I’d use a
[u8; n], but that only implements Deserialise for rather small values of n (and n needs to be large for only very few records, wasting lots of memory for most of them).
- I know I could use a
&[u8], but that seems to mean using
ByteRecordand switching to the zero-allocation borrowing pattern described near the end of the tutorial: https://docs.rs/csv/1.0.2/csv/tutorial/index.html#serde-and-zero-allocation
The latter is probably what I will end up doing (and probably would have eventually, for optimisation) but I feel like I’m missing something simple. I “shouldn’t need” to use
ByteRecord with serde, and especially as it’s only one field that can have non-utf8 string data, I just want a field type to capture it so I can deal with fixing the record afterwards.
What is the type I want?