Parsing truncated (i.e. invalid) JSON

I have JSON in the form of { "foo": { "bar": "baz" }, "val": [1, 2 - note that the output is truncated.

This means using serde_json::Deserializer will fail to process it with Error("EOF while parsing a list", .... - i.e. serde_json::StreamDeserializer which mentions the call Deserializer::from_str(data).into_iter::<Value>(); does not support this kind of streaming.

Still, I would like to get the to "foo", "bar" and "baz" before the parsing fails later on.

While formulating this question, I found that struson by Marcono1234 can be used for this:

use struson::reader::JsonStreamReader;
use struson::reader::ValueType;

fn struson_example() -> Result<(), struson::reader::ReaderError> {
    let data = r#"{ "foo": { "bar": "baz" }, "val": [1, 2  "#;

    let mut j = JsonStreamReader::new(data.as_bytes());

    j.begin_object()?;
    dbg!(j.next_name());
    j.begin_object()?;
    dbg!(j.next_name());

    match dbg!(j.peek()) {
        Ok(ValueType::String) => { dbg!(j.next_string()); }
        Ok(ValueType::Number) => { dbg!(j.next_number_as_str()); }
        Ok(_) => todo!(),
        Err(e) => return Err(e),
    };

    dbg!(j.end_object());
    dbg!(j.next_name());
    dbg!(j.begin_array());
    dbg!(j.next_number::<i32>());

    Ok(())
}

Funny that you ask this just now; a few days ago someone else (I assume) raised a similar question on the Struson GitHub repository. Maybe the solution I proposed there works for your case as well. I am not going to repeat the code here since it is quite verbose, but essentially the idea is to write a custom JsonReader implementation which wraps a JsonStreamReader but handles the EOF as if the end of all open JSON arrays and objects has been reached (and ignoring any incomplete values). And then creates a Struson JsonReaderDeserializer to allow using Serde on this incomplete JSON data.

Maybe you can even directly use the code proposed there, something like:

// See linked GitHub discussion for `deserialize_partial!` implementation
let value = deserialize_partial!(
    json.as_bytes(),
    |d| serde_json::Value::deserialize(d)
)?;

What you show is a more complete use case than what I need: Get the entire JSON data up to that point, i.e. implicitly close all objects so that the document then becomes valid.

I am only looking to navigate to a few nodes in the AST and extract the data directly. Both have an edge case of truncated numbers (any other value can either be parsed in full, or not at all, right?). As I see it, in my case just checking that the next object is valid should suffice.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.