Serde Vs TryFrom

Hi all, I have been digging more and more about serde, I has been using more deserialize to read csvs, and I have ended with a big question.

From what I understand now (which could be wrong).

Deserialize: we use it to read from x thing to a format
Visitor: we transform from x type to y type (format)

Have you notice a visitor logic is just too similar to TryFrom/From? I can't actually distinguish them, and seems serde do not mix Visitors with TryFrom, I tried implement a TryFrom and do not works for deserialize.

So the question is... which is the fundamental difference between Visitor vs TryFrom?

Note: Maybe an example where both differers on the expected logical result would be nice.

do you mean Deserializer? the trait Deserialize is implemented on your own type representing some domain data model, the Deserializer is an abstraction of the concrete data in some specific format like json.

a visitor is just a complicated (usually stateful) multi-entry callback object. it's more like a builder that incrementally build the data piece by piece, it's not related to the conversion trait TryFrom at all.

for example, suppose you have struct, when you deserialize from a format, say, json, you ask the Deserializer of the json format to deserialize a "map", a.k.a. json object, then the json deserializer will call your callback (visitor) for each field of the json object.

1 Like

right, but visitor's trait is a way to transform a known value to other type, which is the equivalent of TryFrom.

We can replace basically all its functions with TryFrom constraints:

And then, we can use that to work with the data model, but I suppose serde must have a reason why declare a new trait instead use TryFrom one.

A type, T, can implement a trait at most once. What happens if there are multiple ways to transform data of type T2 into T? For example, T, can implement TryFrom<&str> one way, but there may be another way that one can reasonably transform a &str into T.

The visitor pattern solves that problem.

I think the question here is: why not express deserialization support as

impl<'de> TryFrom<serde::DeserializationIntermediateEnum<'de>> for MyType {...}

One small reason is that the Visitor trait has useful default method implementations. I am not sure what more significant other reasons there are.

@philomathic_life could you give an example? I don't know which situation, nor how the code would distinguish this cases.

@kpreid is there any method that can not be replaced with TryFrom? (I didn't found one)

Maybe, in simple terms, an implementation of Visitor is almost equivalent indeed to an implementation of TryFrom<Value> for some specific enum type such as this one or this one.

The main differences are probably these 2:

  1. the use of a visitor pattern can have performance benefits; if you think of a T: Deserialize implementation as the moral equivalent of TryFrom<Value> for T, and some D: Deserializer as the moral equivalent of TryFrom<D> for Value, then the use of visitor-style traits fully avoids the need for building (and storing in memory) such an intermediate data structure. Edit: I guess, for something that’s more than just performance benefits: eliminating the intermediate data structure also enables deserialization in the first place, in contexts without any (global) allocator.
  2. the traits from serde allow for data formats that aren’t “self-describing”.
    • If you think of what serde does as a tightly fused combined usage of TryFrom<D> for Value and TryFrom<Value> for T (for an overall fallible conversion from D to T), then the additional thing serde allows for is the type T generating “hints” – essentially – about on which enum variant (of that Value enum) to expect when.
      • With a self-describing data format (e.g. JSON), this can merely aid, effectively, with enforcing a particular “schema” in a more principled manner; and the practical benefit is perhaps limited to allowing nicer error messages[1] that can simultaneously incorporate an understanding of the error in terms of the datatype being deserialized (as provided e.g. through Visitor::expecting) while being able to point towards the issue in the source format (e.g. JSON) directly
      • However, non-self-describing data formats, e.g. very compact binary representations of data of known type, wouldn’t be able to function (through serde) at all in the first place, without this.
    • the relevant mechanism for this “generating hints” process is through the methods of the Deserializer trait, which are available to implementors of Deserialize (don’t confuse these traits :wink:), so those impls involve not only providing the Visitor but also choosing the right Deserializer-method
    • the Visitor trait itself does not involve this “hint”-generation on the top level; but the Visitor trait itself still differs from a TryFrom<Value> impl in that the recursive case of handling arbitrary values in compound value such as a map or seq value, with Visitor, cycles back to involving the Deserialize trait, allowing the next “hint” to be provided, recursively; whereas for a plain Value enum, the shape of all constituent values would need to be pre-determined

On the question of Deserialize/Visitor vs general TryFrom usage, of course (hopefully clear from my explanations above) there’s another view on this: If you view the serde API as a way to do one single, overall, TryFrom<D> for T-kind of conversion – for what’s spelled D: Deserializer, T: Deserialize on the serde side of things, TryFrom would of course be much more general, because serde is specifically about 2-step conversion through the serde data model as an intermediate step. This has the great advantage that you can – in principle[2] – freely combine any “deserializable data types” T with any type of “deserializer” D (the latter generally corresponding to a particular kind of concrete data format). (Also, there’s the fact that Deserializers themself are already a sort-of pre-processed – or pre-wrapped – data type around the underlying data, e.g. a file, or a string containing JSON data. The closer moral equivalent might hence be found with the FromStr. Incidentally, there are other crates that can offer ways of combining these traits & somewhat bridging between them.)


  1. I haven’t thought deeply on this nor specifically tested it for this reply, so maybe nicer error messages would work just as well without those hints; or error messages aren’t even as nice as I’m claiming they could be ↩︎

  2. one counter example being e.g. non-self-describing data formats, whose deserializers won’t work with data types that might rely on the self-describing nature of a data format – compare e.g. the documentation of deserialize_any ↩︎

1 Like

What would DeserializationIntermediateEnum<'de> look like? I think that's one of the points @steffahn is talking about here:

whereas for a plain Value enum, the shape of all constituent values would need to be pre-determined

Check out the links I gave above:

Another possible approach to such an enum can be found within serde itself, used internally for certain features (e.g. untagged enums).

Ah, I was drafting something too abstract. I was thinking such an enum shouldn't even dictate what a "number" should be and thus be defined generically and not as an enum like Number. This would seem to have a performance hit since you're forced to load a HashMap/Vec for "struct-like" types, no?

Indeed it does! And it’s still relevant as such an intermediate representation is actually used in cases like “untagged enum” representation. Compare e.g. issues like this one.

Hi all, thx for all the answers! things are becoming more clear, serde is bigger than it seems.

There is something I'm confused about this, I can get part of this about TryFrom and Visitors, what I don't get that much is Deserialze Trait, because in the TryFrom trait we have D -> U as generics, but Deserialize Trait is a unique implementation, a deserialize objecto which will be converted to only one struct.

IIRC from what you had explain, this would help to deserialize from different format to construct our object... like binary files, maybe compressed files and other things.

With what I know from this thread, if we have two ways to save a struct in binary files, how does serde know/works to recognize them? (format 1 from format 2) we only have the option to implement one Deserialize and do not use any other generics, and the deserialize object could be any of them.

When we try deserialize serde also consumes the object, making it unable to try a second one time... but I'm sure there is a way to do it, because enums is case where it should try until some of them works.

serde does not know the specific format. It abstracts away such details. At some point calling code will dictate what format is being used. For example, if you know you're dealing with JSON; then calling code would use a separate JSON-specific library that is based on the Serde framework (e.g., serde_json). In that library a concrete Deserializer (e.g., Deserializer) is defined that does know the format (e.g., JSON).

This abstraction allows you to define a format-agnostic way to deserialize your type since the specifics of the format are handled externally by the concrete Deserializer implementations. So now you can have a type, Foo, that implements Deserialize and "magically" calling code can construct a Foo from JSON, TOML, CSV, etc. This not only helps calling code since a bunch of separate formats can be used (via format-specific Serde-framework libraries (e.g., serde_json)), but it also helps you the author of Foo since you don't deal with the specific formats since that is handled by the Deserializer.

I recommend you read about Serde and actually code some examples as both the library author of some type and a user. For example, use serde_json to see how a JSON payload is converted into your type and toml for how a TOML payload is converted.

1 Like

Sadly I have read that doc about serde, checked examples, but has been really hard get the big picture of serde and how it being organized, even reading code I do not always know which concept is used to organize and abstract it.

This last part is not just "read code", when we read and there is a design patter, they are not random, usually there is reasons why that one, that reasons follow how will be organized and used, this topic is one of the key parts of why serde can be so hard to understand.

If we check serde_json crate, is not small, not at least to be able to read be able to get what and how serde is doing the work, if you don't know how it works, no big picture, and the docs are also not been able to show this, we enter to decipher a crate, which could take hours and days.

As a note, serde docs are nice to use them, and is very easy use libraries, but maybe to reach that it becomes more complex (?

We can even without understand serde, implement some simple decerializers! the hard part starts when we try to develop more robust ones, and understand the internal workflow to use them better.

If we follow the last part, if any crate specify the original format in the crate, the fundamental logic of TryFrom is being again closer to Serde, but with a lot more flexibility, maybe that is they key point, TryFrom is not able to handle all the features Serde needs (mentioned in this thread above by several users).

One implements Deserialize if they expect/desire the type to be deserialized from multiple formats or because the ecosystem has defined generic libraries that are based on Serde. For example, even if you know for sure that your type will always be sent as JSON from the client, you will still (likely) want to use serde_json since it is battle tested and knows how to correctly parse JSON; thus you'd want to implement Deserialize so that it can be easily created via serde_json.

If you don't meet those conditions, then sure TryFrom will meet your needs; however you are now responsible for manually parsing the specific format.

use core::fmt::{self, Formatter};
use serde::de::{Deserialize, Deserializer, Error, MapAccess, Visitor};
fn main() {
    let json = br#"{"x":10,"y":true}"#;
    let Foo { x, y } = serde_json::from_slice(json.as_slice())
        .unwrap_or_else(|_e| unreachable!("Bug in Foo::deserialize"));
    assert_eq!(x, 10);
    assert!(y);
}
/// I don't care how this type is represented when exchanged somewhere.
/// If JSON is desired, then an instance of `Foo` will look like `{"x":10,"y":true}`.
/// If TOML is desired, then an instance of `Foo` will look like:
/// ```toml
/// x = 10
/// y = true
/// ```
pub struct Foo {
    pub x: u32,
    pub y: bool,
}
/// I don't want to implement JSON parsing or TOML parsing or _any_ parsing. As long as the format has a
/// "`struct`-like" option, then I will only think of a generic "map".
/// It is the specific Serde-based library's responsibility to have implemented such a thing correctly.
///
/// Even if I _know_ JSON is to be used, it's _much_ easier to rely on an already-defined JSON parser this way
/// I don't have to parse arbitrary bytes and look for "weird" things like `'{'` as well as ensure I'm properly
/// ignoring "whitespace" as defined by [RFC 8259](https://www.rfc-editor.org/rfc/rfc8259) as well as meeting
/// all the other requirements defined in that RFC.
impl<'de> Deserialize<'de> for Foo {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        /// Visits a "map" and transforms it into a `Foo`.
        struct FooVisitor;
        impl<'d> Visitor<'d> for FooVisitor {
            type Value = Foo;
            fn expecting(&self, formatter: &mut Formatter<'_>) -> fmt::Result {
                formatter.write_str("Foo")
            }
            fn visit_map<A>(self, mut map: A) -> Result<Self::Value, A::Error>
            where
                A: MapAccess<'d>,
            {
                enum Field {
                    X,
                    Y,
                }
                impl<'e> Deserialize<'e> for Field {
                    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
                    where
                        D: Deserializer<'e>,
                    {
                        struct FieldVisitor;
                        impl Visitor<'_> for FieldVisitor {
                            type Value = Field;
                            fn expecting(&self, formatter: &mut Formatter<'_>) -> fmt::Result {
                                write!(formatter, "'{X}' or '{Y}'")
                            }
                            fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
                            where
                                E: Error,
                            {
                                match v {
                                    X => Ok(Field::X),
                                    Y => Ok(Field::Y),
                                    _ => Err(E::unknown_field(v, FIELDS)),
                                }
                            }
                        }
                        deserializer.deserialize_identifier(FieldVisitor)
                    }
                }
                let mut x = None;
                let mut y = None;
                while let Some(key) = map.next_key()? {
                    match key {
                        Field::X => {
                            if x.is_some() {
                                return Err(Error::duplicate_field(X));
                            }
                            x = map.next_value().map(Some)?;
                        }
                        Field::Y => {
                            if y.is_some() {
                                return Err(Error::duplicate_field(Y));
                            }
                            y = map.next_value().map(Some)?;
                        }
                    }
                }
                x.ok_or_else(|| Error::missing_field(X)).and_then(|x| {
                    y.ok_or_else(|| Error::missing_field(Y))
                        .map(|y| Foo { x, y })
                })
            }
        }
        const X: &str = "x";
        const Y: &str = "y";
        const FIELDS: &[&str; 2] = &[X, Y];
        deserializer.deserialize_struct("Foo", FIELDS, FooVisitor)
    }
}

or much easier when relying on the derive feature of serde:

#[derive(Deserialize)]
#[serde(deny_unknown_fields)]
pub struct Foo {
    pub x: u32,
    pub y: bool,
}

@philomathic_life thx for the example! I think that code... is a nice example of something really hard to understand for the ones who do not know serde deep enough.

I can read the code, but I don't get very well why and for what is write like that, is not usual to me make a impl like that inside other impl, I suppose is to hide it form other things, but could be out in a module right?
How are you able to call FooVisitor in deserialze_struct? it should be variable not a type.
We could also implement the visitor on Foo, why most be done like this?

I think simple cases are already handled by derive features, but we should have more of them to get into this.

Thx a lot all the help!

Why is it necessary to check for duplicate fields when deserializing from a map? I assume that whatever is reading from some format to prepare the map would already have decided whether to reject any duplicate fields (or keys), or choose whether the last one or the first one wins.

Why is it necessary to check for duplicate fields

"Necessary" is a strong word. It is my choice as the author of Foo if I want to forbid duplicate fields just as it's my choice if I want to require the x to be a u32. There is a difference between invalid JSON/TOML/etc. and valid JSON/TOML/etc. but an invalid "map" based on my "schema".

A downside of a Deserializer enforcing that duplicate "keys" don't exist is performance. Clearly such an implementation must be internally storing all of the keys in some collection which adds overhead; therefore as long as it's documented, I wouldn't be surprised if there exist libraries that parse a certain format without enforcing the uniqueness of each "key" even if <insert RFC/spec> states "keys" MUST be unique. At the very least, it's reasonable for said library to offer an alternative Deserializer that doesn't enforce that in the interest of performance and simply states it's on the Deserialize impl to enforce it.

Fortunately, derive(Deserialize) automatically forbids duplicate fields; so if you don't want that; then you have to implement it manually yourself and "hope" calling code uses a format that doesn't forbid duplicate fields and uses a library that doesn't error when duplicates do exist (if allowed by the spec) and the library either stores all instances allowing your code to chose the "correct" one or the choice that is made is the choice you want.

For example, JSON allows duplicate keys in JSON objects; so without checking for the duplicates myself, I would have to decide "arbitrarily" which one I should keep (or perhaps I'll store all instances of the key). As the author of Foo, I don't want that though so I ensure there are no duplicates even if the format says it's OK. Additionally, there is likely a performance benefit for calling code to use JSON since the parser shouldn't have to store much state. Of course I'm assuming the JSON parser is implemented in a performant way which includes not storing each key (e.g., serde_json). Obviously nothing stops one from implementing their own JSON parser in a non-performant way that either enforces uniqueness of each key (despite the spec allowing it) or allowing duplicates but arbitrarily picking which one or even storing all of them. Such implementations will obviously have additional overhead spatially and temporally compared to something like serde_json. Also implementations that arbitrarily chose which key to use actually prevents me from erring when a duplicate key exists since those parsers are "lossy" and don't expose such info.

If TOML is used, however, then duplicates are forbidden by the spec; so my checks are unnecessary since an error will occur anyway so long as the library (e.g., toml) actually enforces it—the error will be "different" in that it would be an actual TOML error and not just a "data"/"schema" error. Calling code that decides to use TOML will have to accept the performance hit of the Deserializer storing each and every key internally.

Since Deserialize impls are format-agnostic, I can't "assume" the Deserializer will enforce unique keys just like I can't "assume" it'll forbid/allow "unknown" keys. In the special case where I know the format and the library/parser for said format, then of course I could specialize my implementation around that. For example if Foo implements FromStr such that it parses the passed &str as TOML where internally it uses toml; then Foo could certainly implement Deserialize such that it doesn't check for the duplicates since we know toml already does that. Of course it would be better to create a private newtype around Foo that does this implementation since you want to "force" TOML since otherwise calling code could use Foo::deserialize for any format instead of being "forced" to use Foo::from_str.

Addendum

Because one may incorrectly assume "map" implies "key" uniqueness, it stands to reason one may also assume that a "map" implies a missing "key" to be the same as a "key" with a "null" value for those formats that have a notion of "null" (e.g., JSON). Unsurprisingly, that's not the case.

In particular the current impl will have a "type" error if x is "null" and will have a "missing field" error if x doesn't exist. The below impls would be used for different behavior:

  • Allow a missing x but forbid "null":
pub struct Foo {
    pub x: Option<u32>,
    pub y: bool,
}
impl<'de> Deserialize<'de> for Foo {
    // â‹®
                y.ok_or_else(|| Error::missing_field(Y))
                    .map(|y| Foo { x, y })
    // â‹®
}
  • Allow "null" but forbid a missing x:
pub struct Foo {
    pub x: Option<u32>,
    pub y: bool,
}
impl<'de> Deserialize<'de> for Foo {
    // â‹®
                            x = map.next_value::<Option<_>>().map(Some)?;
    // â‹®
                x.ok_or_else(|| Error::missing_field(X)).and_then(|x| {
                    y.ok_or_else(|| Error::missing_field(Y))
                        .map(|y| Foo { x: x.flatten(), y })
                })
    // â‹®
}
  • Allow both a missing x and "null" and treat them the same
pub struct Foo {
    pub x: Option<u32>,
    pub y: bool,
}
impl<'de> Deserialize<'de> for Foo {
    // â‹®
                y.ok_or_else(|| Error::missing_field(Y))
                    .map(|y| Foo { x: x.flatten(), y })
    // â‹®
}
  • Allow both a missing x and "null" but distinguish between them (specifically None iff x does not exist, Some(None) iff x exists and is "null", Some(Some(_)) iff x exists and is not "null"):
pub struct Foo {
    pub x: Option<Option<u32>>,
    pub y: bool,
}
impl<'de> Deserialize<'de> for Foo {
    // â‹®
                y.ok_or_else(|| Error::missing_field(Y))
                    .map(|y| Foo { x, y })
    // â‹®
}

Addendum 2

I should also warn that there are some "gotchas" that probably should be documented. For example, Visitor::visit_str warns:

It is never correct to implement visit_string without implementing visit_str. Implement neither, both, or just visit_str.

However there is no similar warning for visit_borrowed_str even though many parsers will always error (e.g., toml). Even more problematic are the visit_* methods for "integers" which probably should have a warning, or even requirement, that one must implement visit_i64/visit_u64 anytime any of the other "integer" methods are implemented. I've been bitten by this a few times after enough time has passed between encountering it. For example, if you have a type Even(u32) that requires the contained u32 to be even and you define EvenVisitor that only implements Visitor::visit_u32, many parsers will error since many will call EvenVisitor::visit_u64 even if you hint for the Deserializer to call Visitor::visit_u32 via Deserializer::deserialize_u32. For example:

use core::fmt::{self, Formatter};
use serde::de::{Deserialize, Deserializer, Error, Unexpected, Visitor};
fn main() {
    let json = b"2";
    // This will be an "invalid type" error; however if I uncomment `EvenVisitor::visit_u64`, then this will succeed.
    let err = serde_json::from_slice::<Even>(json.as_slice());
}
pub struct Even(u32);
impl<'de> Deserialize<'de> for Even {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        struct EvenVisitor;
        impl Visitor<'_> for EvenVisitor {
            type Value = Even;
            fn expecting(&self, formatter: &mut Formatter<'_>) -> fmt::Result {
                formatter.write_str("even 32-bit unsigned integer")
            }
            fn visit_u32<E>(self, v: u32) -> Result<Self::Value, E>
            where
                E: Error,
            {
                if v & 1 == 1 {
                    Err(E::invalid_value(
                        Unexpected::Unsigned(u64::from(v)),
                        &"even 32-bit unsigned integer",
                    ))
                } else {
                    Ok(Even(v))
                }
            }
            // I must implement this in addition to, or instead of, `Visitor::visit_u32`.
            // fn visit_u64<E>(self, v: u64) -> Result<Self::Value, E>
            // where
            //     E: Error,
            // {
            //     u32::try_from(v)
            //         .map_err(E::custom)
            //         .and_then(|val| self.visit_u32(val))
            // }
        }
        deserializer.deserialize_u32(EvenVisitor)
    }
}

Thank you for the very thorough and thoughtful reply. I learned yet more from reading it.

What remains confusing to me is the use of the word map in the serde library. I am used to the concept of a(n in-memory) map implying that the keys are unique. The next_key method that you're calling in your earlier example appears to be defensive against receiving the same key multiple times.

I follow your discussion about how a JSON object can contain members with duplicate names—per RFC 8259 §4—but I expected that by the time the serde library is talking in terms of maps that it would have eliminated the possibility of duplicates, or yielded a nonempty set of values for a given key.

Perhaps I am misunderstanding what the serde library means by map, though. If you have any documentation that you can suggest that I read, I'd appreciate it.

What remains confusing to me is the use of the word map in the serde library. I am used to the concept of a(n in-memory) map implying that the keys are unique.

Like anything words mean different things depending on the context. From the perspective of Serde which is designed to be format-agnostic, "map" does not imply "key" uniqueness. In fact, it doesn't even imply that each "key" is the same "type"[1].

This more general definition, like anything that's more "general"/abstract, allows it to be used in more contexts and possibly allows for performance optimizations. For example, JSON only recommends that keys in objects SHOULD be unique; this allows for optimizations for 100%-conforming parsers since such parsers, despite going against the recommendation, can almost be stateless. Read the source code for serde_json::Deserializer for more information. If Serde defined "map" in a less general way that required key uniqueness; then that would prevent such optimizations even if the spec/RFC allows for it making Serde stricter, and thus less useful, than it needs to be.

Additionally when dealing with libraries, one shouldn't assume more than what the documentation and public API states[2]; so "map" is literally anything that implements MapAccess which does not state nor imply "key" uniqueness. If you wish serde used a different name for that trait, then that's more of a philosophical discussion.


  1. One could work around this by simply defining the "key" as a sum type/enum, but hopefully you know what I mean. ↩︎

  2. Obviously that's not to say that documentation is a formal specification, so one must expect a certain level of "informal" guarantees. ↩︎

1 Like