How do I do streaming parsing in serde-json

I’m trying to understand how do I do streaming json parsing in rust.

For example, I have a JSON with serialized array of billions of structs Point(i32,i32,i32), and I want to aggregate this data. If I have a small file I could do

let points: Vec<Point> = serde_json::from_str(read_all_file("file.txt"));
let aggregated_result = points
    .into_iter()
    .fold((0, 0, 0), |(a, b, c), (x, y, z)| (a + x, b + y, c + z));

But the problem here is we are parsing the whole file when we actually need one row at each moment.

How could it be done more efficiently? My current implementation just perform nasty string indexOf operations to indicate object boundaries and then call from_str::<Point>() on the substring, but it looks very hacky, unreliable and unwise.

This should be exactly what you need: https://serde.rs/stream-array.html

2 Likes

Great! Gonna try it. Thanks

@birkenfeld it’s almost what I’d like to have, but unfortunately deserialize_with doesn’t work on the root object. When I’m trying to apply it to the root object I get

error: unknown serde container attribute `deserialize_with`
  --> src\main.rs:13:9
   |
13 | #[serde(deserialize_with = "deserialize_debetors")]
   |         ^^^^^^^^^^^^^^^^

But maybe I could just implement deserialize…

“Deserialize_with on the root object” is what you get by writing a Deserialize impl.

2 Likes