Hello, I need to deserialize a stream of JSON objects, where some of the objects can be different than what is expected in the stream.
When a "different" object is encountered, the deserializer should return an error and continue with the next item in the stream.
I'm using the serde_json::StreamDeserializer for deserializing the stream, however, it stops when it returns an error.
This behavior has been already reported issue #70, but as far as I can see, the proposed change hasn't been implemented yet.
A minimal reprex:
use serde::{de, Deserialize, Deserializer}; // 1.0.147
use serde_json; // 1.0.87
#[derive(Deserialize, Debug)]
struct Intermediary {
id: String,
}
#[derive(Debug)]
struct Final {
id: String,
}
impl<'de> Deserialize<'de> for Final {
fn deserialize<D>(deserializer: D) -> std::result::Result<Self, D::Error>
where
D: Deserializer<'de>,
{
let im = Intermediary::deserialize(deserializer)?;
println!("{:?}", im);
if im.id == "this_is_ok" {
Ok(Final { id: im.id })
} else {
Err(de::Error::custom(format!("error: {:?}", im.id)))
}
}
}
fn main() {
let feature_sequence = r#"{"id":"this_is_ok"}
{"id":"this_is_err"}
{"id":"this_is_ok"}"#;
let stream = serde_json::Deserializer::from_str(&feature_sequence).into_iter::<Final>();
for final_res in stream {
match final_res {
Ok(cf) => println!("{:?}", cf),
Err(e) => println!("error: {:?}", e),
}
}
}
This doesn't work in all situations. If Final errors, then the StreamDeserializer still stops. For example with this input.
let feature_sequence = r#"{"id":"this_is_ok"}
{"id":1}
{"id":"this_is_ok"}"#;
The only trick I know which reliably works is deserializing into a type which cannot fail, something like serde_json::Value. You then perform the failable deserialization from that. That is basically how untagged works too. You do get a lot of downsides for using untagged, though.
That's because apparently the JSON deserializer doesn't skip the rest of the value when a type error is encountered. That's unfortunate and I would consider that a bug, but it can still be worked around using byte_offset(), and by deserializing a generic Value only when typed deserialization fails: Playground.
The above is a bit messy, I'll try to clean it up soon.
One workaround that I could come up with is to an implicit Deserialize through FromStr.
This is not my preference at all, however, because I can only deserialize from strings (unless I implement from_slice, from_reader too).
Plus, using the serde's Deserializer gives a whole range of options which I don't have with my FromStr implementation, an in general it feels more ergonomic in my opinion.
I'm doing a public API for a library and de/serializing data is at the center of it. So at the end of the day, I am after an ergonomic way to deserialize a potentially heterogenous (not only Final-s) stream.
I think the way to do this, is to return an Error if an item in the stream is not a Final, let the caller handle the error and continue with the next item in the stream.
Because Final is a public struct in the library, I'm not convinced about having a FinalResult as @erelde demonstrated. Although probably this is the closest what I had in mind initially.
My problem in particular with the FinalResult and is that the library user have to match on Ok values which are not ok, but can also be a FinalResult::Err, and that's a bit confusing I think.
Long story short, I though that with the StreamDeserializer I could have the most ergonomic, idiomatic implementation, as opposed to using Cursors and FromStr as in the playground I linked above.
... am I making sense? Or maybe there are simpler ways for this?
@jonasbbThis playground contains a cleaned-up version of the previous solution, which also passes the other test case you provided.
@balazsdukai it doesn't need FinalResult, either; it provides a continuous stream of Result<T, JsonError> without needing to wrap your type in a newtype.
No problem. I realized I forgot something, so I made another small tweak. This saves you from having to check manually whether the JSON was valid upon each iteration (in order to stop the iteration if it wasn't), so this latest version will never return an infinite stream of identical errors.