Step past errors in serde_json StreamDeserializer

Hello, I need to deserialize a stream of JSON objects, where some of the objects can be different than what is expected in the stream.
When a "different" object is encountered, the deserializer should return an error and continue with the next item in the stream.

I'm using the serde_json::StreamDeserializer for deserializing the stream, however, it stops when it returns an error.
This behavior has been already reported issue #70, but as far as I can see, the proposed change hasn't been implemented yet.

A minimal reprex:

use serde::{de, Deserialize, Deserializer}; // 1.0.147
use serde_json; // 1.0.87

#[derive(Deserialize, Debug)]
struct Intermediary {
    id: String,
}

#[derive(Debug)]
struct Final {
    id: String,
}

impl<'de> Deserialize<'de> for Final {
    fn deserialize<D>(deserializer: D) -> std::result::Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        let im = Intermediary::deserialize(deserializer)?;
        println!("{:?}", im);
        if im.id == "this_is_ok" {
            Ok(Final { id: im.id })
        } else {
            Err(de::Error::custom(format!("error: {:?}", im.id)))
        }
    }
}

fn main() {
    let feature_sequence = r#"{"id":"this_is_ok"}
    {"id":"this_is_err"}
    {"id":"this_is_ok"}"#;
    let stream = serde_json::Deserializer::from_str(&feature_sequence).into_iter::<Final>();
    for final_res in stream {
        match final_res {
            Ok(cf) => println!("{:?}", cf),
            Err(e) => println!("error: {:?}", e),
        }
    }
}

Does anyone have a workaround for this?

1 Like

You could make your own result type and implement deserizalize for that: playground.

This doesn't work in all situations. If Final errors, then the StreamDeserializer still stops. For example with this input.

let feature_sequence = r#"{"id":"this_is_ok"}
{"id":1}
{"id":"this_is_ok"}"#;

The only trick I know which reliably works is deserializing into a type which cannot fail, something like serde_json::Value. You then perform the failable deserialization from that. That is basically how untagged works too. You do get a lot of downsides for using untagged, though.

1 Like

That's trivial to fix, by mapping over the outer error instead of returning it immediately (which is what ? did).

Is it? With a slight tweak {"id":1, "abc": 2} I get an endless list of inner error: "expected value at line 2 column 12".

That's because apparently the JSON deserializer doesn't skip the rest of the value when a type error is encountered. That's unfortunate and I would consider that a bug, but it can still be worked around using byte_offset(), and by deserializing a generic Value only when typed deserialization fails: Playground.

The above is a bit messy, I'll try to clean it up soon.

One workaround that I could come up with is to an implicit Deserialize through FromStr.
This is not my preference at all, however, because I can only deserialize from strings (unless I implement from_slice, from_reader too).

Plus, using the serde's Deserializer gives a whole range of options which I don't have with my FromStr implementation, an in general it feels more ergonomic in my opinion.

I'm doing a public API for a library and de/serializing data is at the center of it. So at the end of the day, I am after an ergonomic way to deserialize a potentially heterogenous (not only Final-s) stream.
I think the way to do this, is to return an Error if an item in the stream is not a Final, let the caller handle the error and continue with the next item in the stream.

Because Final is a public struct in the library, I'm not convinced about having a FinalResult as @erelde demonstrated. Although probably this is the closest what I had in mind initially.
My problem in particular with the FinalResult and is that the library user have to match on Ok values which are not ok, but can also be a FinalResult::Err, and that's a bit confusing I think.

        match final_res {
            Ok(FinalResult::Ok(cf)) => println!("{:?}", cf),
            Ok(FinalResult::Err(e)) => println!("error: {:?}", e),
            Err(e) => println!("error: {:?}", e),
        }

Long story short, I though that with the StreamDeserializer I could have the most ergonomic, idiomatic implementation, as opposed to using Cursors and FromStr as in the playground I linked above.

... am I making sense? :slight_smile: Or maybe there are simpler ways for this?

@jonasbb This playground contains a cleaned-up version of the previous solution, which also passes the other test case you provided.

@balazsdukai it doesn't need FinalResult, either; it provides a continuous stream of Result<T, JsonError> without needing to wrap your type in a newtype.

2 Likes

Thank you!

No problem. I realized I forgot something, so I made another small tweak. This saves you from having to check manually whether the JSON was valid upon each iteration (in order to stop the iteration if it wasn't), so this latest version will never return an infinite stream of identical errors.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.