I looked at the implementation of the ArrayString deserialization from the arrayvec crate (array_string.rs.html -- source), and ArrayStringVisitor there defines both visit_str and visit_bytes methods. They both do the same thing, the only difference is that visit_bytes checks if the bytes are valid ut8 first.
The same is true for the StringVisitor from the serde crate which is used to deserialize String: impls.rs - source (it also implements visit_string and visit_byte_buf for owned data).
I also looked at the chrono crate and its DateTimeVisitor, on the other hand, implements only visit_str: https:// docs.rs/chrono/0.4.11/src/chrono/datetime.rs.html#1853.
Won't the utf8 check be already performed when you call deserialize_str or deserialize_string so that it is enough to implement just visit_str or visit_string, and if the data is not valid ut8 the deserializer will fail with a type error?
If I want to implement a custom deserializer for a type that is serialized as a string (I need to do some validations in a similar way like chrono so can't just derive), should I implement both visit_str and visit_bytes? If yes, then why chrono doesn't do it? And if visit_str is enough, then why serde and arrayvec also implement visit_bytes?
It would probably make sense to implement visit_bytes in the case of chrono too. That said, if you're just serializing as a String, I recommend just using String's implementation. Here's an example from a crate of mine:
That said, if you're just serializing as a String , I recommend just using String 's implementation.
The reason I don't do this is that my type doesn't hold a String, it is constructed from an &str which always results in a copy:
pub struct Id(ArrayString<[u8; MAX_LEN]>);
impl Id {
pub fn new(s: &str) -> Result<Self, Error> {
// validate the data, then make an ArrayString from it
}
}
so I want to avoid allocating a String .
It would probably make sense to implement visit_bytes in the case of chrono too.
I don't mind implementing both visit_str and visit_bytes manually, but I'm curious to know why this is necessary and what are the possible issues if I don't do it.
impl<'de> Deserialize<'de> for BytesString {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
let s: &str = Deserialize::deserialize(deserializer)?;
Ok(Id::validate(s))
}
}
This will fail if e.g. you're deserializing { "foo": "bar \n test" } because of the escape. If you need to fix that, you can deserialize into a Cow<str>, which allocates only when necessary.
As for potential issues, check out what the default implementations on the trait are. Afaik visit_str calls visit_bytes if not implemented, so the latter is more general. Note that just because you told serde you want a string, it's allowed to call visit_bytes. Whether it will do so depends on the file format implementation.
Unfortunately, Cow always deserializes into Cow::Owned. (And I don't see an easy fix, because the lifetimes don't match up, so making it conditionally deserialize without allocation would be a breaking change.)