How to parse invalid json with serde_json?

I am trying to parse >10.000 files, that contain json, that have format error like

{
   "some_entry: [
      1,
      2,
   ],
}

The trailing commas there after "2" and "]"
or

{
    "some_entry":"data1""another_entry":"data1"
}

The missing comma there between "data1" and "another_entry"

When i try to parse that with serde json eg

let json: Value = serde_json::from_str(&contents).expect("Error parsing JSON: {err:?}");

Then it throws errors on all of these files because of errors like those above. And i wonder if there is a way to have serde_json ignore these errors and continue just parsing.

What i ultimately try to archive is to convert these invalid json files into proper jsons. Read the file, fix the json, write it back. But i dont see a way to archive this with either serde_json or any other crate in rust so far

(:

There appears to be a fork that allows trailing commas.

1 Like

In fact it does that, but this fork cannot deal with missing commas. Thanks for your answer (:
Also tried out https://crates.io/crates/serde_json_lenient, which does the same, but is also not capable to deal with missing commas

The real answer is: you don't use serde for that. Serde's data model doesn't allow for syntax errors in the input of Deserialize.

What you need is an error-correcting JSON parser.

3 Likes

If you are constantly encountering all kinds of broken input to work around, it becomes less likely over time that an off-the-shelf solution covers your needs. I would recommend writing your own preprocessor on the input string. It's unlikely you end up writing a proper JSON parser, most likely just some simple state machine. We have done that for the types of problematic JSON we encounter: GitHub - getsentry/rust-json-forensics: Lossily convert non-standard JSON and overflowing integers to something serde-json can parse

3 Likes

Yeah i just found out that serde_json is only compatible with strict json. And maybe there is no crate in rust yet that can really deal with all the sort of errors humans do that maintain json files themself. Thats what i believe the files are i am trying to parse.

But i found this GitHub - josdejong/jsonrepair: Repair invalid JSON documents
And im going to play around with that, their online tool could fix all erros that are known to me in those files. Thanks for your Answer (:

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.