Serde: parse almost-JSON where array commas are missing

I have a large amount of data that is perfect JSON, except that there are arrays of numbers with missing commas ("[1 2 3 4]" instead of "[1, 2, 3, 4]". I want to use serde to parse it, so I will need a custom deserializer for the dodgy arrays.

This is what I have so far (simplified example):

The docs for serde (Implementing a Deserializer · Serde) show how to create a custom impl for SeqAccess that parses a comma-separated array, so I could easily adapt that to parse a space-separated array. But I can't see how to use that custom impl. By the time we get to visit_seq(), serde has already provided the SeqAccess, and I can't change it. (I feel like I'm on the wrong track here.)

serde_json can only parse syntactically valid JSON. A deserializer allows you to turn the parsed JSON values into your own Rust type, not control the JSON parser itself.

1 Like

Good info, thanks (if not what I wanted to hear :slight_smile: ).

Looks like a from-scratch custom deserializer is the way forward.

You could also make a copy of serde_json and modify it, which would likely be less work (though of course if your format is significantly simpler than JSON in other ways, you may wish to use simpler code).

2 Likes

The easiest might be to write a preprocessor that turns your data into valid JSON. If the only difference to JSON is the presence of arrays of numbers without commas, you could scan the data and look out for [ that is not part of a String and then this is where an array starts. So you need to track whether you are in a String or not, which you can do by noting all quotes while ignoring escaped quotes " inside Strings.

2 Likes

Alternative approach would be to tell whoever decided on this format to stop doing that.

5 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.