Struct members validation on serde_json::Deserialize

I have following parser, that deserialize UTF-8 buffer with serde_json crate:

pub fn from_bytes(buffer: &[u8]) -> Result<Self, Error> {
    match serde_json::from_slice::<Self>(buffer) {
        Ok(request) => {
            // validate header
            if request.header.to_string() != header::VALUE {
                return Err(Error::Header);
            }
            // validate data members 
            // by create a copy with external builder/validator
            if let Err(e) = Data::build(
                request.data.username().to_string(),
                request.data.password().to_string(),
            ) {
                return Err(Error::Data(e));
            }
            // return original deserialized result,
            // drop the test copy in scope
            Ok(request)
        }
        Err(e) => Err(Error::Json(e)),
    }
}

But also I want to apply some additional validation to the struct members parsed. On example above it works but requires extra copy (e.g. build new struct with build validation)

Found this subject, seems I want re-implement entire parser logic from zero. But I want just apply some filters, without rewrite entire Deserialize logic.. maybe some options for this case?

p.s. by working with byte buffer (client-server API), doubts maybe I don't even want JSON parser for performance reasons, also with it double copy issue on validation ..

1 Like

You don't want to change the logic of your deserialiser, you want to change how your types are deserialised. That's the beauty of serde, it allows you to only worry about your data types, making them work with a variety of data formats.

The way you can add validation into the deserialisation process might require you to write custom Deserialize implementations, newtype wrappers like struct Email(String); or simple #[serde(deserialize_with = "...")] attributes on the fields that need to be validated, all depending on how complex/hermetical your validation is. Helper crates like serde_with provide some common validations.

1 Like

Thanks, but yet not sure I understand how can I validate username for example.

The username is the struct Username(String) with following implementation:

// username.rs

use regex::Regex;
use serde::{Deserialize, Serialize};

pub const PATTERN: &str = r"^[\w]{2,16}$";

#[derive(Serialize, Deserialize, Debug)]
pub struct Username(String);

impl std::fmt::Display for Username {
    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
        write!(f, "{}", self.0)
    }
}

impl Username {
    pub fn build(value: String) -> Result<Self, Error> {
        match Regex::new(PATTERN) {
            Ok(regex) => {
                if regex.is_match(&value) {
                    Ok(Self(value))
                } else {
                    Err(Error::Value(PATTERN.to_string()))
                }
            }
            Err(e) => Err(Error::Regex(e)),
        }
    }
}

So it has regex validation on Self build, when serde just parse it as the String without any condition check (as know nothing about my build method).

#[serde(deserialize_with = "...")] requires deserealization impl and serde_with, as you said, useful for common validations.

I'd implement Deserialize manually for Username, putting the logic of your build function into the deserialisation process:

use std::fmt;
use std::sync::LazyLock;

use regex::Regex;
use serde::{
    de::{Deserialize, Deserializer, Error, Visitor},
    Serialize,
};

const PATTERN: &str = r"^[\w]{2,16}$";

static RE_USERNAME: LazyLock<Regex> = LazyLock::new(|| Regex::new(PATTERN).unwrap());

#[derive(Serialize, Debug, PartialEq)]
struct Username(String);

impl std::fmt::Display for Username {
    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
        write!(f, "{}", self.0)
    }
}

struct UsernameVisitor;

impl<'de> Visitor<'de> for UsernameVisitor {
    type Value = String;

    fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
        formatter.write_str("a correctly formatted username")
    }

    fn visit_str<E: Error>(self, v: &str) -> Result<Self::Value, E> {
        if RE_USERNAME.is_match(v) {
            Ok(v.to_owned())
        } else {
            Err(E::custom(format!(
                "username doesn't match {}",
                RE_USERNAME.to_string()
            )))
        }
    }
}

impl<'de> Deserialize<'de> for Username {
    // Required method
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        Ok(Self(deserializer.deserialize_string(UsernameVisitor)?))
    }
}

fn main() {
    let username: Username = serde_json::from_str(r#""validusername""#).unwrap();
    assert_eq!(username, Username("validusername".to_owned()));

    let invalid_username = serde_json::from_str::<Username>(r#""__invalidusername""#);
    assert!(invalid_username.is_err());
}

Playground.

2 Likes

What I do for newtypes wrapping strings (i.e. Username(String) that require validation is to implement the validation using the FromStr trait, and then derive the implementation of DeserializeFromStr from serde_with for it.

4 Likes

You can avoid the boilerplate of implementing Visitor by delegating to an existing Deserialize implementation (and hence its visitor). I use this technique all the time.

impl<'de> Deserialize<'de> for Username {
    fn deserialize<D: Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
        let v = String::deserialize(deserializer)?;
        if RE_USERNAME.is_match(&v) {
            Ok(Self(v))
        } else {
            Err(D::Error::custom(format!(
                "username doesn't match {}",
                RE_USERNAME.to_string()
            )))
        }
    }
}
4 Likes

Almost done, just analyzer can't find D::Error::custom function:

no function or associated item named `custom` found for associated type `<D as error::response::_::_serde::Deserializer<'de>>::Error` in the current scope
items from traits can only be used if the trait is in scoperustcClick for full compiler diagnostic
username.rs(1, 1): trait `Error` which provides `custom` is implemented but not in scope; perhaps you want to import it: `use crate::error::response::_::_serde::de::Error;

That sounds like a glitch in RA (maybe try restarting your editor), kpreid's code works just fine.

Edit: Ah, sorry, you need to import the serde::de::Error trait.

1 Like

Ah really, forgot about this magic act I'm usually doing all the time :slight_smile:

upd.

got it just after save: serde::de::Error::custom

Thanks much, guys for your help!

1 Like

What performance benefit were you hoping for and what first implementation are you referring to? If you want to have zero copy deserialisation, you can't work with owned types like String. Instead you need to borrow from the input buffer. In your case that'd mean storing a &str rather than String in Username and using a data format that allows you to get a valid utf-8 string from the buffer's bytes directly (JSON would be such a format—sans escape sequences; see quinedot's post below).

My tip would be that before going down the route of adding complexity to your code base to avoid extra allocations, be sure you are actually optimizing a bottleneck of your application.

2 Likes

I've removed this my last request to not disturb, because yes, there is lot of optimizations. Sometimes I don't know what exactly want in result) Last question just about zero-copy yes.

This subject is one of them and thanks to yours help it solved.

Finally I have following code, at least now I can't parse unchecked / invalid struct:

pub mod error;
pub use error::Error;

use regex::Regex;
use serde::{Deserialize, Deserializer, Serialize};

pub const PATTERN: &str = r"^[\w]{2,16}$";

#[derive(Serialize, Debug)]
pub struct Username(String);

impl std::fmt::Display for Username {
    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
        write!(f, "{}", self.0)
    }
}

impl<'de> Deserialize<'de> for Username {
    fn deserialize<D: Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
        let value = String::deserialize(deserializer)?;
        match validate(&value) {
            Ok(()) => Ok(Self(value)),
            Err(e) => Err(serde::de::Error::custom(e.to_string())),
        }
    }
}

impl Username {
    pub fn build(value: String) -> Result<Self, Error> {
        match validate(&value) {
            Ok(()) => Ok(Self(value)),
            Err(e) => Err(e),
        }
    }
}

pub fn validate(username: &str) -> Result<(), Error> {
    match Regex::new(PATTERN) {
        Ok(regex) => {
            if regex.is_match(username) {
                Ok(())
            } else {
                Err(Error::Value(PATTERN.to_string()))
            }
        }
        Err(e) => Err(Error::Regex(e)),
    }
}
1 Like

JSON cannot be zero-copy in the general case.

1 Like

If you use the new type pattern for validation, you can use the nutype crate to make it easier.

1 Like

I made a crate for creating newtypes with custom validation logic in deserialization: serde_newtype - Rust
The docs aren't great but the examples should get you going. It's quite barebones but I used it a lot for some project at work and it did the job just fine.

1 Like

All of the solutions so far, e.g. delegating to String, do technically create inconsistent serialize vs deserialize implementations (skipping the call to the appropriate deserialize_newtype_struct call).

serde already supports good support for deserialization via (fallible) conversion. In other words, the solution should be a simple as

#[derive(Serialize, Deserialize, Debug)]
#[serde(try_from = "UncheckedUsername")] // use fallible `TryFrom` conversion below
pub struct Username(String);

// private intermediate helper type for deserialization
#[derive(Deserialize)]
#[serde(rename = "Username")] // rename to match `Serialize` from above
struct UncheckedUsername(String);
impl TryFrom<UncheckedUsername> for Username {
    type Error = Error; // this works with `serde(try_from …)` if your error type is `Display`
    fn try_from(unchecked: UncheckedUsername) -> Result<Self, Error> {
        Self::build(unchecked.0) // can just delegate to existing `::build` function
    }
}

(playground)

bonus hint: try to avoid the need to re-build the Regex on every call to build and put that into a lazy global instead.

(If you still want to propagate the regex error, you could even consider storing the Result in the lazy static, and using a static reference in your Error variant, i.e. Regex(&'static regex::Error),)

Or you could even skip this manual work, and use existing helper macros out there. (This crate even test-runs the regex compilation at compile-time so there's no regex::Error to deal with at run-time at all.)

4 Likes

I have a related question. What should you do when a struct requires validation across fields? For a contrived example say you have

struct Foo {
   a: u32,
   b: u32,
}

and you want to enforce some weird constraint, say a > b.

Same idea, right? E.g. (sans a custom error type) that would be

use serde::{Deserialize, Serialize};

#[derive(Serialize, Deserialize, Debug)]
#[serde(try_from = "UncheckedFoo")]
pub struct Foo {
    a: u32,
    b: u32,
}

#[derive(Deserialize)]
#[serde(rename = "Foo")]
struct UncheckedFoo {
    a: u32,
    b: u32,
}

impl TryFrom<UncheckedFoo> for Foo {
    type Error = String;
    fn try_from(unchecked: UncheckedFoo) -> Result<Self, String> {
        let UncheckedFoo { a, b } = unchecked;
        if a > b {
            Ok(Foo { a, b })
        } else {
            Err(format!(
                "invalid value, a must be larger than b, got a={a} and b={b}"
            ))
        }
    }
}
3 Likes