So far so good we have a type with some invariant that gets checked when instantiating it. But now I want to make this type deserializable. If I know just derive(Deserialize) then the check of new won't be used and I could deserialize an invalid state, so basically the invariant is not guaranteed anymore.
Of course I can implement Deserialize myself but that seems a bit tedious for such a simple and very regular task as validating some invariant. Is there any better option than implementing Deserialize myself? Just note that this is a simple example in practice the structs might be bigger and the invariants more complicated.
A simplest way would be to have two structs: one with #[derive(Serialize, Deserialize)] on it, and another with something like TryFrom/Into targeting the first one. This way, you'll decouple storage format (handled by serde) from your invariants (handled by TryFrom).
Yes, this is one approach I took in the past. I see the benefit of the separation in some cases but it also has the drawback that documentation is now scattered across two types. I would love my deserialization layer to kind of also document what format it expects AS WELL as what conditions it expects on it.
Also would you recommend just duplicating the fields or would you go with a newtype on top of the raw type for such things?
You may know this, but you can have the validation struct be private so that there is only one (documented) public struct using serde(try_from). You still have the field duplication unless you have them both be wrappers to some inner type.
#[derive(Deserialize)]
#[serde(try_from = "ValueRangeUnchecked")]
pub struct ValueRange { min: i32, max: i32 } // Guarantees min <= max
impl ValueRange {
pub fn new(min: i32, max: i32) -> Result<Self, &'static str> {
if min <= max {
Ok(Self { min, max })
} else {
Err("Invalid")
}
}
}
#[derive(Deserialize)]
struct ValueRangeUnchecked { min: i32, max: i32 } // private!
impl TryFrom<ValueRangeUnchecked> for ValueRange {
type Error = &'static str;
fn try_from(src: ValueRangeUnchecked) -> Result<Self, Self::Error> {
ValueRange::new(src.min, src.max)
}
}
fn parse() {
let pair = ValueRange::deserialize(input).unwrap(); // Guaranteed min <= max
}
Thanks for this addition. Yes I was aware of this.
I still feel like there should be a simpler solution than this. It feels unnecessary and in our case it lead developers be tempted to go with the "easy" solution of just validating i.e. do
I'm not sure the the verify approach is any "easier" (I've added an actual implementation to my original post) - the number of lines is pretty much the same. I do admit that one needs to know how to do it, but that would be true for any other approach as well.
You don’t necessarily need any TryFrom implementation, or even (in this simple case) a second custom type. You can fit everything into one impl Deserialize:
impl<'de, T> serde::Deserialize<'de> for OrderedPair<T>
where
T: Ord + serde::Deserialize<'de>,
{
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: serde::Deserializer<'de>,
{
let (left, right) = <(T, T)>::deserialize(deserializer)?;
OrderedPair::new(left, right).ok_or_else(|| serde::de::Error::custom("mis-ordered pair"))
}
}
impl Deserialize from scratch is complex, but impl Deserialize that delegates to another impl Deserialize is not.
Which is in my opinion bad practice because at some point we will just forget to call verify for some of the deserialized stuff.
While I understand your hesitancy to expose functions that allow one to construct an instance of your type without upholding its invariants, at some point that is not possible. Nothing stops a Deserializeimpl from being incorrectly implemented nor is there anything that stops your own developers from constructing an instance directly via its constructor. Try to define such additional functions which the same accessibility as its constructor; thus making OrderedPair::verify private just like its constructor and not pub as you have done in your example.
You may be able to get Clippy to assist you with the single_call_fn lint. By adding something like
#[deny(dead_code, reason = "want to only call this in one specific area")]
#[expect(clippy::single_call_fn, reason = "want to only call this in one specific area")]
fn foo() {}
you will ensure that there is exactly one place where that function is called; of course if you need to call it more than once, then this won't help.
Last
I would love my deserialization layer to kind of also document what format it expects
What do you mean by "format" exactly? You don't mean to say you want the format to be JSON or some other specific data format do you? If so, you have no choice but to define a private type that implements Deserialize, then define a pub function on OrderedPair that takes in a slice/str that specifically calls something like serde_json::from_str; otherwise anyone will be able to deserialize your type using any format. Of course that's one of the main points of Serde: to allow library authors to define a single "deserialize" function without forcing downstream code to conform to some pre-determined format but instead allowing downstream code to dictate that specific format. If you do intend to force a specific format, you may be able to avoid Serde altogether depending on the library you use. For example, toml only needs the parse feature enabled to be able to deserialize TOML; then instead of using serde, you'd use the toml API directly.