How do I future proof (de)serialization of types that might change across versions?

I would like to make sure that different versions of my serialized type can be deserialized into the appropriate version of the type. The goal is to facilitate migration between versions.
What follows might be an instantiation of the XY problem.

One idea is to include a version tag as a field and allow matching against this tag. I think this requires wrapping the serialized type and a stable wrapper type.

struct VersionedMyStruct {
    version: u32,
    serialized_my_struct: Vec<u8>,
}

struct MyStruct { /* … */ }

type OldStruct = same_crate_old_version::MyStruct;

fn main() -> Result<(), _> {
    let versioned_struct = VersionedMyStruct::deserialize(/* … */)?;
    let serialized = versioned_struct.serialized_my_struct;
    let my_struct = match versioned_struct.version {
        0 => MyStruct::from(OldStruct::deserialize(serialized)?),
        1 => MyStruct::deserialize(serialized)?,
        v => return Err(UnknownVersion(v)),
    };
}

Another idea is to have a version field in the struct itself and fail deserialization with a specific error if its value is not as expected.

const CURRENT_VERSION: u32 = 1;

struct MyStruct {
    version: u32,
    /* … */
}

impl Deserialize for MyStruct {
    fn deserialize(data: &[u8]) -> Result<Self, _> {
        /* … */
        if the_version != CURRENT_VERSION {
            return Err(IncorrectVersion);
        }
        /* … */
    }
}

type OldStruct = same_crate_old_version::MyStruct;

fn main() -> Result<(), _> {
    let serialized = data_from_somewhere();
    let my_struct = match MyStruct::deserialize(serialized) {
        Ok(my_struct) => my_struct,
        Err(IncorrectVersion) => MyStruct::from(OldStruct::deserialize(serialized)?),
        _ => return Err(UnknownVersion),
    };
}

Both of these ideas feel somewhat clunky. This is partly because my_crate depends on old versions of itself in both ideas. Also, versioning is manual, and I'm not sure how I can reliably detect whether the version must be bumped. Keeping track of which versions of the type can be migrated, and how to do so, is probably a necessary burden.

Are there standard solutions for scenarios like these? A pattern, a crate, a serde feature? Are the sketched ideas approaching the mark, or am I completely off target? Happy about any & all feedback & ideas.

I assume these structs represent some form of information exchanged via some kind of API.
If so, different versions of that API should have their separate implementation of the DTOs needed.
And the API should be designed in such a way, that the requester provides information of the version they intend to use via metadata (e.g. HTTP headers).

The way I handle this (theoretically; I haven't yet got a production application doing actual version migration) is that there is an enum which is the “serialization schema”, which is explicitly tagged, and the struct (or enum) that the actual library uses has Serialize and Deserialize implementations which delegate to the serialization enum. This way,

  • serde takes care of applying the version tagging scheme and you only have to write the conversion.
  • You're free to completely change what fields the struct has because each of the serialization enum’s variants can be different.
  • No dependency on the old crate version.
  • Your application struct does not have a version field that is only relevant to serialization.
use serde::{Deserialize, Deserializer, Serialize, Serializer};
use std::borrow::Cow;

pub struct MyStruct {
    foo: String,
    bar: u32,
}

#[derive(Serialize, Deserialize)]
enum MyStructSer<'a> {
    V1 { foo: Cow<'a, str>, bar: bool },
    V2 { foo: Cow<'a, str>, bar: u32 },
}

impl Serialize for MyStruct {
    fn serialize<S: Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
        let &Self { ref foo, bar } = self;
        MyStructSer::V2 {
            foo: foo.into(),
            bar,
        }
        .serialize(serializer)
    }
}

impl<'de> Deserialize<'de> for MyStruct {
    fn deserialize<D: Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
        Ok(match MyStructSer::deserialize(deserializer)? {
            MyStructSer::V1 { foo, bar } => MyStruct {
                foo: foo.into(),
                bar: bar.into(),
            },
            MyStructSer::V2 { foo, bar } => MyStruct {
                foo: foo.into(),
                bar,
            },
        })
    }
}

Specific techniques in the code:

  • Because I'm using pattern matching for both conversions, it's not possible to forget to update the impls to handle a new field.
  • The Cow<str> avoids unnecessarily cloning strings during serialization.
9 Likes

I second @kpreid's solution. In layman's terms, you have to version your schema and have an explicit field for tagging (i.e. version: 1). Then use this field as the discriminant for a tagged enum, so that you can deserialize any version of your schema.

1 Like

Thanks, that looks like an elegant solution. :slightly_smiling_face:

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.