I have following parser, that deserialize UTF-8 buffer with serde_json crate:
pub fn from_bytes(buffer: &[u8]) -> Result<Self, Error> {
match serde_json::from_slice::<Self>(buffer) {
Ok(request) => {
// validate header
if request.header.to_string() != header::VALUE {
return Err(Error::Header);
}
// validate data members
// by create a copy with external builder/validator
if let Err(e) = Data::build(
request.data.username().to_string(),
request.data.password().to_string(),
) {
return Err(Error::Data(e));
}
// return original deserialized result,
// drop the test copy in scope
Ok(request)
}
Err(e) => Err(Error::Json(e)),
}
}
But also I want to apply some additional validation to the struct members parsed. On example above it works but requires extra copy (e.g. build new struct with build validation)
Found this subject, seems I want re-implement entire parser logic from zero. But I want just apply some filters, without rewrite entire Deserialize logic.. maybe some options for this case?
p.s. by working with byte buffer (client-server API), doubts maybe I don't even want JSON parser for performance reasons, also with it double copy issue on validation ..
You don't want to change the logic of your deserialiser, you want to change how your types are deserialised. That's the beauty of serde, it allows you to only worry about your data types, making them work with a variety of data formats.
The way you can add validation into the deserialisation process might require you to write custom Deserialize implementations, newtype wrappers like struct Email(String); or simple #[serde(deserialize_with = "...")] attributes on the fields that need to be validated, all depending on how complex/hermetical your validation is. Helper crates like serde_with provide some common validations.
What I do for newtypes wrapping strings (i.e. Username(String) that require validation is to implement the validation using the FromStr trait, and then derive the implementation of DeserializeFromStr from serde_with for it.
You can avoid the boilerplate of implementing Visitor by delegating to an existing Deserialize implementation (and hence its visitor). I use this technique all the time.
impl<'de> Deserialize<'de> for Username {
fn deserialize<D: Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
let v = String::deserialize(deserializer)?;
if RE_USERNAME.is_match(&v) {
Ok(Self(v))
} else {
Err(D::Error::custom(format!(
"username doesn't match {}",
RE_USERNAME.to_string()
)))
}
}
}
Almost done, just analyzer can't find D::Error::custom function:
no function or associated item named `custom` found for associated type `<D as error::response::_::_serde::Deserializer<'de>>::Error` in the current scope
items from traits can only be used if the trait is in scoperustcClick for full compiler diagnostic
username.rs(1, 1): trait `Error` which provides `custom` is implemented but not in scope; perhaps you want to import it: `use crate::error::response::_::_serde::de::Error;
What performance benefit were you hoping for and what first implementation are you referring to? If you want to have zero copy deserialisation, you can't work with owned types like String. Instead you need to borrow from the input buffer. In your case that'd mean storing a &str rather than String in Username and using a data format that allows you to get a valid utf-8 string from the buffer's bytes directly (JSON would be such a format—sans escape sequences; see quinedot's post below).
My tip would be that before going down the route of adding complexity to your code base to avoid extra allocations, be sure you are actually optimizing a bottleneck of your application.
I've removed this my last request to not disturb, because yes, there is lot of optimizations. Sometimes I don't know what exactly want in result) Last question just about zero-copy yes.
This subject is one of them and thanks to yours help it solved.
Finally I have following code, at least now I can't parse unchecked / invalid struct:
I made a crate for creating newtypes with custom validation logic in deserialization: serde_newtype - Rust
The docs aren't great but the examples should get you going. It's quite barebones but I used it a lot for some project at work and it did the job just fine.
All of the solutions so far, e.g. delegating to String, do technically create inconsistent serialize vs deserialize implementations (skipping the call to the appropriate deserialize_newtype_struct call).
serde already supports good support for deserialization via (fallible) conversion. In other words, the solution should be a simple as
#[derive(Serialize, Deserialize, Debug)]
#[serde(try_from = "UncheckedUsername")] // use fallible `TryFrom` conversion below
pub struct Username(String);
// private intermediate helper type for deserialization
#[derive(Deserialize)]
#[serde(rename = "Username")] // rename to match `Serialize` from above
struct UncheckedUsername(String);
impl TryFrom<UncheckedUsername> for Username {
type Error = Error; // this works with `serde(try_from …)` if your error type is `Display`
fn try_from(unchecked: UncheckedUsername) -> Result<Self, Error> {
Self::build(unchecked.0) // can just delegate to existing `::build` function
}
}
bonus hint: try to avoid the need to re-build the Regex on every call to build and put that into a lazy global instead.
(If you still want to propagate the regex error, you could even consider storing the Result in the lazy static, and using a static reference in your Error variant, i.e. Regex(&'static regex::Error),)
Or you could even skip this manual work, and use existing helper macros out there. (This crate even test-runs the regex compilation at compile-time so there's no regex::Error to deal with at run-time at all.)