Serde: should I implement both visit_str and visit_bytes?

xOlX2Xuq · April 11, 2020, 4:51pm

I looked at the implementation of the ArrayString deserialization from the arrayvec crate (array_string.rs.html -- source), and ArrayStringVisitor there defines both visit_str and visit_bytes methods. They both do the same thing, the only difference is that visit_bytes checks if the bytes are valid ut8 first.

The same is true for the StringVisitor from the serde crate which is used to deserialize String: impls.rs - source (it also implements visit_string and visit_byte_buf for owned data).

I also looked at the chrono crate and its DateTimeVisitor, on the other hand, implements only visit_str: https:// docs.rs/chrono/0.4.11/src/chrono/datetime.rs.html#1853.

Won't the utf8 check be already performed when you call deserialize_str or deserialize_string so that it is enough to implement just visit_str or visit_string, and if the data is not valid ut8 the deserializer will fail with a type error?

If I want to implement a custom deserializer for a type that is serialized as a string (I need to do some validations in a similar way like chrono so can't just derive), should I implement both visit_str and visit_bytes? If yes, then why chrono doesn't do it? And if visit_str is enough, then why serde and arrayvec also implement visit_bytes?

alice · April 11, 2020, 5:12pm

It would probably make sense to implement visit_bytes in the case of chrono too. That said, if you're just serializing as a String, I recommend just using String's implementation. Here's an example from a crate of mine:

impl Serialize for BytesString {
    fn serialize<S: Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
        serializer.serialize_str(self.as_str())
    }
}

impl<'de> Deserialize<'de> for BytesString {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        String::deserialize(deserializer)
            .map(BytesString::from)
    }
}

xOlX2Xuq · April 11, 2020, 5:40pm

Thanks for you answer.

That said, if you're just serializing as a String , I recommend just using String 's implementation.

The reason I don't do this is that my type doesn't hold a String, it is constructed from an &str which always results in a copy:

pub struct Id(ArrayString<[u8; MAX_LEN]>);

impl Id {
    pub fn new(s: &str) -> Result<Self, Error> {
        // validate the data, then make an ArrayString from it
    }
}

so I want to avoid allocating a String .

It would probably make sense to implement visit_bytes in the case of chrono too.

I don't mind implementing both visit_str and visit_bytes manually, but I'm curious to know why this is necessary and what are the possible issues if I don't do it.

alice · April 11, 2020, 5:43pm

You can deserialize directly into a &str.


impl<'de> Deserialize<'de> for BytesString {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        let s: &str = Deserialize::deserialize(deserializer)?;
        Ok(Id::validate(s))
    }
}

This will fail if e.g. you're deserializing { "foo": "bar \n test" } because of the escape. If you need to fix that, you can deserialize into a Cow<str>, which allocates only when necessary.

As for potential issues, check out what the default implementations on the trait are. Afaik visit_str calls visit_bytes if not implemented, so the latter is more general. Note that just because you told serde you want a string, it's allowed to call visit_bytes. Whether it will do so depends on the file format implementation.

H2CO3 · April 11, 2020, 6:24pm

Unfortunately, Cow always deserializes into Cow::Owned. (And I don't see an easy fix, because the lifetimes don't match up, so making it conditionally deserialize without allocation would be a breaking change.)

alice · April 11, 2020, 6:25pm

I didn't know that. That's unfortunate.

xOlX2Xuq · April 11, 2020, 6:28pm

Cow deserializes into Owned unless used with #[serde(borrow)]: Deserializer lifetimes · Serde

The following solution seems to work good:

impl<'de> Deserialize<'de> for Id {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        #[derive(Deserialize)]
        struct Value<'a>(#[serde(borrow)] Cow<'a, str>);

        let value = Value::deserialize(deserializer)?.0;

        Id::new(&value).map_err(serde::de::Error::custom)
    }
}

H2CO3 · April 11, 2020, 8:05pm

Oh, nice. TIL.

system · July 10, 2020, 8:05pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Why are there 2 types for deserializing in serde?	18	4120	March 20, 2020
Csv + serde vs non-utf8 (easily) help	5	1079	January 12, 2023
What is wrong with my serde deserializer/visitor? help	5	724	February 14, 2022
Beginner question : api lookup question help	5	401	September 20, 2021
Serde Deserializer Visitor Error: Expected Type Parameter E help	6	199	January 4, 2024

Serde: should I implement both visit_str and visit_bytes?

Related Topics