Any hacks to get spans with serde?

Hello, I have a program that should get jsons (or yamls, it can be changed) and use them. But one of the requirements is to be very user friendly and tell where exactly an error occured, including the semantic errors, which are domain specific. So I would need line and column ranges for each node. Currently I use serde with serde_json to parse and it works, but there's no straightforward method to get this done.

What could you advise in this situation? I really like the serde way of having structs describe the structure, it would be a pity to write it all by hand, and there's a lot of things.

Can you provide a code example of your current approach that apparently doesn't contain the error position information?

Or is the thing you're talking about the fact that serde_json only refers to points in the text, not ranges?

Edit: Oh, you want to report errors that point back to the JSON for errors that happen way later than during parsing?

Short version:
It may be possible to write your own deserializer that records span info (the source text and ranges) per value, and somehow look up that span info when an error happens.

Long version:
(disclaimer, this is just one idea on my wishlist, which may be unrealistic)

I think to be able to report "where the value came from exactly", given the value may be passed through many different types, needs:

  • for every field that is on the data type, a "span info" wrapper type
  • the span info wrapper would hold a reference to the source of that information, whether it is a file path, request/response body, and the actual value for the underlying field
  • when you need to serve an error to the user, you can extract the information from the span info wrapper and present it nicely

There's variations on how that can be implemented, like the span info wrapper being an id, and there's some "global map of id to span infos", which you look up when you need to present the error.

Writing to that map may be sending the span info to a channel, and receiving an id back (and needing to record that id on the data type somehow).

Then there's "what if you have a derived value, which came from two source locations, e.g. concatenating a string", and showing both span infos in the error. Maybe every data transformation you do, you also need to populate the span info.

Not sure if that helps at all, but it's something I've thought about a lot.

do you mean annotated parsing result (like an AST or similar thing)? that's not what a "deserialization" library is designed for. you need a proper parser generator for that.

there is serde_spanned, but it's mostly only supported by the TOML serde library, no serde_json support yet.

granit-parser can parse both YAML and JSON and provides span information. This crate supports surrogate pairs and other edge cases what were traditionally different between YAML and JSON. It supports both char-based and byte-based coordinates. No hacks.

If your goal is to get exact error location while doing exactly serde deserialization, you can also use serde-saphyr that builds on the top of granit-parser and would report precise error location with snippet, same as Rust compiler does. It will not report locations of tokens where there is no error, but you can use it with garde or validator, these two tools may cover your domain specific cases. If you use very aggressive Serde renames, flattening then Spanned may be required for domain specific diagnostics.

Here is how to get spans from granit-parser:

    let yaml = "name: Alice\nitems:\n  - book\n  - pen\n";

    for next in Parser::new_from_str(yaml) {
        let (event, span) = next?;
        let source = span
            .byte_range()
            .map(|range| &yaml[range])
            .unwrap_or_default();

        println!(
            "{event:?}: chars={}..{}, bytes={:?}, start={}:{}, end={}:{}, indent={:?}, source={source:?}",
            span.start.index(),
            span.end.index(),
            span.byte_range(),
            span.start.line(),
            span.start.col(),
            span.end.line(),
            span.end.col(),
            span.indent,
        );
    }

The output:

StreamStart: chars=0..0, bytes=Some(0..0), start=1:0, end=1:0, indent=None, source=""
DocumentStart(false): chars=0..0, bytes=Some(0..0), start=1:0, end=1:0, indent=None, source=""
MappingStart(0, None): chars=0..0, bytes=Some(0..0), start=1:0, end=1:0, indent=None, source=""
Scalar("name", Plain, 0, None): chars=0..4, bytes=Some(0..4), start=1:0, end=1:4, indent=Some(0), source="name"
Scalar("Alice", Plain, 0, None): chars=6..11, bytes=Some(6..11), start=1:6, end=1:11, indent=None, source="Alice"
Scalar("items", Plain, 0, None): chars=12..17, bytes=Some(12..17), start=2:0, end=2:5, indent=Some(0), source="items"
SequenceStart(0, None): chars=21..21, bytes=Some(21..21), start=3:2, end=3:2, indent=None, source=""
Scalar("book", Plain, 0, None): chars=23..27, bytes=Some(23..27), start=3:4, end=3:8, indent=None, source="book"
Scalar("pen", Plain, 0, None): chars=32..35, bytes=Some(32..35), start=4:4, end=4:7, indent=None, source="pen"
SequenceEnd: chars=36..36, bytes=Some(36..36), start=5:0, end=5:0, indent=None, source=""
MappingEnd: chars=36..36, bytes=Some(36..36), start=5:0, end=5:0, indent=None, source=""
DocumentEnd: chars=36..36, bytes=Some(36..36), start=5:0, end=5:0, indent=None, source=""
StreamEnd: chars=36..36, bytes=Some(36..36), start=5:0, end=5:0, indent=None, source=""

I'll try this out! Hopefully it will be manageable

If the value is 0, and 0 value is disallowed for that field, this is exactly "domain specific error". To find where this happened in the document, span information is required.

I got my hands on this today and unfortunately, this approach doesn't work for me :sweat_smile: . The Read trait from serde_json is a sealed one, so I can't make my own that will intercept the position of the stream, other approaches don't work either, for various reasons, mostly due to stuff being private.

I decided that the best solution in my case would be to fork and add the support.

maybe submit your changes as a PR to serde_json?