[Serde-YAML] Deserialize inner block only

Hello,

I have a small question about serde deserialization. I have a yaml file like this

basket:
  apple:
    x: "12"

and here's a short program to deserialize it in a standard way

use serde::Deserialize;

#[derive(Debug, Clone, Deserialize)]
struct Data {
    basket: Basket,
}

#[derive(Debug, Clone, Deserialize)]
struct Basket {
    apple: Apple,
}

#[derive(Debug, Clone, Deserialize)]
struct Apple {
    x: String,
}

fn main() {
    let s = "basket:\n  apple:\n    x: hello";
    println!("{}", s);
    let data: Data = serde_yaml::from_str(s).unwrap();
    println!("{:?}", data);
}

However I would like to deserialize only the inner apple block without needing to define intermediary Basket struct. Is it possible? How can I do it?

1 Like

I don't see how that could work: the serde de/serialization process is based on the type system, and uses that to parse the input.

That means that leaving out the container struct(s) implies serde has no idea how much input to read before reaching the apple entry, since that's exactly what (part of) the job would be of the Deserialize impl for Basket.

It could work to some extent if the parser first parses into a structure independent from the type system. Then still everything gets parsed, but one could pick out the chunk which should get mapped into the type system.

At the risk of being fingerpointed for mentioning that json crate yet again, it can do this with JSON and manual mapping. Along the lines of (untested):

let data = json::parse(s).unwrap();
let apple_json = data["basket"]["apple"];
let apple = Apple {
  x: apple_json["x"]
};
println!("{:?}", apple);

No idea whether serde has that under the hood somewhere or whether an equivalent crate for YAML exists.

Yeah, I have tackling such challenge also on my road map. :slight_smile:

stappers@juli:~/src/rust/workingprogressproject
$ cat note.txt | tail -n 5

Partial Parse

* https://users.rust-lang.org/t/serde-how-to-parse-json-partially-7-years-later/97030
* https://docs.rs/serde_yaml/latest/serde_yaml/value/index.html
stappers@juli:~/src/rust/workingprogressproject
$

Sharing that note is all I can do for now.

The problem with this approach is that even if it works, it ends up parsing the entire source code string anyway, so you don't gain performance, but you do lose type safety relative to leveraging the type system.

1 Like

Thank you for the quick response.

For my use case I think I'll use from (or try_from) atrribute to parse intermediary structure.

use std::collections::HashMap;
use serde::Deserialize;

#[derive(Debug, Clone, Deserialize)]
struct Data {
    #[serde(rename="basket")]
    apple: Apple,
}

#[derive(Debug, Clone, Deserialize)]
#[serde(from="HashMap<String,HashMap<String,String>>")]
struct Apple {
    x: String
}

impl From<HashMap<String, HashMap<String, String>>> for Apple {
    fn from(mut value: HashMap<String, HashMap<String,String>>) -> Self {
        let mut apple = value.remove("apple").unwrap();
        Self {
            x: apple.remove("x").unwrap(),
        }
    }
}

On importing YAML/JSON data from a foreign remote source, there can't be type safety. They don't have my data structure, I don't have theirs. Reading such data is a mix of some guessing and a lot of plausibility checks, at least that's what I do.

I assume that if the opening question was about self-written data, @vvrably wouldn't write unwanted data to his storage to begin with. Or have all the required type structures in place already.

Normally there is the type of the value you serialize to/deserialize from, which provides structure in the context of other values to deserialize. Granted, on its own that doesn't guarantee that the data is deserialized correctly, though it does make it trivial to make it so, since the type itself describes the de/serialization details, at least when using serde.
In effect it's a declarative way to provide de/serialization instructions.

It also provides guardrails in the face of future code maintenance. In a tiny example like this it's tempting to make the argument that that difference doesn't matter. Scale up the size of the code base however, and those guardrails go from nice to have to a definite must-have.

On another note: the json library works, but the API looks and feels a lot as if a Pythonista or especially a PHPer thought to themselves "meh I don't like the Rust way" and wrote a serialization library without understanding what exactly it is that makes serde so useful, or why it was written that way.
It's... something that can surely be done, but one cannot help but wonder about the wisdom of such a library and its use in terms of software reliability over time, relative to serde and the philosophy behind it. I guess only time will tell.

4 Likes

Right. Now your code downloads a blob of JSON from some public web server and you want to extract data from that blob.

  • What if JSON is valid, but content is an error message instead of the expected data? Parsing the error message gives an answer on how to get the wanted data.
  • What if a field is present on Thursday, but missing on Friday?
  • What if a number is a number one day, but a string of digits on another day? JavaScript easily confuses these two, both are usable numbers.
  • What if there are fields you've never seen before?
  • What if you need just that item 5 levels deep and don't care about all the other stuff? Similar to the opening question here.
  • What if JSON structure is different from what Rust code needs? (I think I know the answer here: one has to build both structures and write a mapping)

All these are very valid questions when dealing with foreign data. So far I can't see how serde allows to deal with them. serde appears to be brilliant when data fits the expected structure, but falling apart quickly when it doesn't fit.

Both, PHP and Python are proven in countless battles for several decades. Hard to argue their approach isn't useful.

Well, what? You don't have the data, then. That's hardly the fault of Rust or Serde or type systems.

That's spelled Option<T>.

That's an enum.

Then you can't do anything useful with them anyway. Eother ignore them (the default behavior of derived Deserialize impls), or #[serde(flatten)] them into a HashMap.

But I have to note at this point that all of these points strongly contradict the way you should design modern APIs. You should have a strongly-typed schema and emit/parse data according to it. Anything else only ends up being a horrible, unmaintainable mess.

It's not hard to extract fields from structs. If you think it is, you may be in for a bad surprise w.r.t. how much effort programming requires in general.

Then you either:

  • customize the behavior of Deserialize impls (and read the documentation of serde thoroughly)
  • or write occasional manual impls
  • or ask the data source to be re-implemented reasonably.

Yeah, but they are the exception, rather than the norm. You can't expect libraries to be optimized for subpar, inconsistent, non-idiomatic, carelessly-written APIs.

Then you must read its documentation. If your data is so badly mangled that after applying all the possible #[serde(…)] attributes, you still can't parse it, it's time to re-evaluate your approach. (But again, you can always just write a manual impl, and still delegate most of it to derived impls for the parts that are regularly typed.)

Yeah, they are proven to have lost those countless battles. Anyone arguing that these languages are easy to use for writing robust APIs clearly never debugged large production systems when they broke hopelessly in a manner that would have been completely preventible if only there were a strong type system. The fact that historically, we didn't knew better is regrettable, but this doesn't mean that we shouldn't be trying to improve the status quo.

4 Likes

See, and that's the point where one is back to writing manual code, just with a complex abstraction on top.

So I should write a nice letter to Google or Yahoo and ask them to change their software to make it fit for my hobby project. They'll have a good laugh.

Being one who has written code for dealing with quite a number of such APIs, I can tell that inconsistencies are the norm rather than the exception. The world is lazy, "works for me" is the maximum level of quality many developers can imagine. Maybe that's disappointing, but that's how it is, and what real world code has to deal with.

The manual impls should be the exception, not the rule, and even most of them can be delegated to existing code in practice.

Maybe if you shared some actual details and data with us, you'd get more useful feedback. I can't fathom how it can be so fundamentaly hard to parse Google's APIs. They know what they are doing.

16 posts were split to a new topic: [Serde-JSON] Deserializing and Transposing Yahoo Finance data

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.