Why are there 2 types for deserializing in serde?

To expand a bit on @RustyYato's notes about the serde data model: serde has sort of a two-phase process to deserialization. One part deals with generic data formats, and the other deals with specific data structures.

The job of the Deserializer is to deal with the "general-purpose" data format. It parses input and converts it into the basic serde data model. It's kind of like if you took all of your inputs in whatever format (TOML, CBOR, etc.) and converted them all to JSON so that the rest of your code only had to know about JSON. In this case, though, instead of JSON, you have the serde data model, which is basically just a custom in-memory data format that happens to look kind of like a cross between JSON and basic Rust types. The whole point of this phase is to handle everything that's specific to any given data format and compartmentalize it so no other code has to care about it. Note that I'm talking about generic data formats like JSON or XML, not specific schemas for those formats (such as a specific JSON-based RPC format that defines specific object types that you'll need to handle). The schema is handled on the "other" side of serde.

Once the Deserializer translates the input data to serde's data model, it passes that to the visitor. (Sort of. As I understand it, for formats that aren't self-describing, the "phase 2" code has to tell the Deserialzer what it's supposed to expect before it does the conversion, so the "phase 2" code drives the Deserializer and actively asks it for the "next value" instead of just passively receiving a bunch of converted data. But that's not very important when it comes to understanding the basic concept.) The Deserialize trait then pulls the genericized data out of the visitor and tries to map it to a Rust data structure. It only cares about the serde data model, so it doesn't have to worry (for the most part) about format-specific details such as how to escape special characters in strings or whether the format stores integers as text or binary. The Deserializer has already handled that part. This is the phase that basically performs the task of your hypothetical deserialize_struct method.

This model does have its flaws. It's more complex than deserializing in one step, and some formats have features that don't map well into the serde data model. However, it turns out that the majority of general-purpose data interchange formats make use of the same concepts, and the two-phase model makes it possible to deserialize the same data structure from a wide variety of underlying formats.

8 Likes