Serde Conditional Deserialization for Binary Formats (Versioning)

I'm trying to deserialize data in a simple non-self describing format to Rust structs. I've implemented a custom Deserializer for this format and it works great when I'm deserializing the data into a struct like this for example:

#[derive(Serialize, Deserialize)]
pub struct Position {
    x: f32,
    z: f32,
    y: f32,
}   

However, let's say this Position struct had a new field added (could have been removed too) in a new version:

#[derive(Serialize, Deserialize)]
pub struct Position {
    x: f32,
    z: f32,
    y: f32,
    is_visible: bool, // This field was added in a new version
}   

But I still need to support both data from both versions of Position. The version of the data (known at runtime) can be given to the Deserializer but how can the Deserializer know the version of a field (known at compile time)?

I've looked at #[serde(deserialize_with)] but it didn't work because I cannot get the needed version information.

I 've also looked at implementing Deserialize manually for Position and I can receive the versions of the fields of Position by implementing something like Position::get_version(field_name: &str).

However, I cannot figure how to get the version of the data currently being deserialized because Deserialize::deserialize only has a trait bound Deserializer<'de> and I cannot make that bound stricter by adding another bound (so it doesn't know about my custom Deserializer).

At this point, I'm thinking about giving the version data of each field when instantiating the Deserializer but I'm not sure if that will work or if there is a better way to go.

You could add the #[serde(default)] attribute to your is_visible field. That'll let it deserialize the is_visible flag if it is present, otherwise it'll use the default (false).

Alternatively, if you want to differentiate between the field not being present and its value being false, you could change it from a bool to an Option<bool>

1 Like

Check out deku and binrw.

2 Likes

I went with a solution that while I don't think is the most idiomatic, is capable of handling an arbitrary number of versions. From my answer to my question on StackOverflow:

In short:

  • we implement a Version trait which gives the necessary version info to the Deserializer
  • Deserializer has VersionedSeqAccess (implements serde::de::SeqAccess) that sets a flag
  • When flag is set, we put None for that field and immediately unset the flag

The idea is to implement the following trait for the struct:

pub trait Version {
    /// We must specify the name of the struct so that any of the fields that 
    /// are structs won't confuse the Deserializer
    fn name() -> &'static str;
    fn version() -> VersionInfo;
}

#[derive(Debug, Clone)]
pub enum VersionInfo {
    /// Present in all versions
    All,

    /// Present in this version
    Version([u16; 4]),

    /// Represent Versions of structs
    Struct(&'static [VersionInfo]),

    // we can add other ways of expressing the version like a version range for ex.
}

Here is how it will be implemented for the example struct Position. This type of manual deriving is error prone so this can be improved with a derive macro (see end):

struct Position {
    x: f32,
    z: f32,
    y: f32,
    // With this solution versioned field must be wrapped in Option
    is_visible: Option<bool>, 
}

impl Version for Position {
        fn version() -> VersionInfo {
            VersionInfo::Struct(&[
                VersionInfo::All,
                VersionInfo::All,
                VersionInfo::All,
                VersionInfo::Version([1, 13, 0, 0]),
            ])
        }
        fn name() -> &'static str {
            "Position"
        }
}

Now, the deserializer will be instansiated with the version of the data format we are currently parsing:

pub struct Deserializer<'de> {
    input: &'de [u8],

    /// The version the `Deserializer` expect the data format to be
    de_version: [u16; 4],

    /// Versions of each field. (only used when deserialzing to a struct)
    version_info: VersionInfo,

    /// Whether to skip deserialzing current item. This flag is set by `VersionedSeqAccess`.
    /// When set, the current item is deserialized to `None`
    skip: bool,

    /// Name of struct we are deserialzing into. We use this to make sure we call the correct
    /// visitor for children of this struct who are also structs
    name: &'static str,
}

pub fn from_slice<'a, T>(input: &'a [u8], de_version: [u16; 4]) -> Result<T, Error>
where
    T: Deserialize<'a> + Version,
{
    let mut deserializer = Deserializer::from_slice(input, de_version, T::version(), T::name());
    let t = T::deserialize(&mut deserializer)?;

    Ok(t)
}

Now that the deserializer has the all the information it needs, this is how we define deserialize_struct:

fn deserialize_struct<V>(
        self, name: &'static str, fields: &'static [&'static str], visitor: V,
    ) -> Result<V::Value, Self::Error>
    where
        V: Visitor<'de>,
    {
        if name == self.name {
            if let VersionInfo::Struct(version_info) = self.version_info {
                assert!(version_info.len() == fields.len()); // Make sure the caller implemented version info somewhat correctly. I use a derive macro to implement version so this is not a problem
                visitor.visit_seq(VersionedSeqAccess::new(self, fields.len(), &version_info))
            } else {
                panic!("Struct must always have version info of `Struct` variant")
            }
        } else {
            // This is for children structs of the main struct. We do not support versioning for those
            visitor.visit_seq(SequenceAccess::new(self, fields.len()))
        }
    }

Here is how serde::de::SeqAccess will be implemented for VersionedSeqAccess:

struct VersionedSeqAccess<'a, 'de: 'a> {
    de:           &'a mut Deserializer<'de>,
    version_info: &'static [VersionInfo],
    len:          usize,
    curr:         usize,
}

impl<'de, 'a> SeqAccess<'de> for VersionedSeqAccess<'a, 'de> {
    type Error = Error;

    fn next_element_seed<T>(&mut self, seed: T) -> Result<Option<T::Value>, Error>
    where
        T: DeserializeSeed<'de>,
    {
        if self.curr == self.len {
            // We iterated through all fields
            Ok(None)
        } else {
            // Get version of the current field
            let version = &self.version_info[self.curr as usize];
            self.de.version_info = version.clone();

            // Set the flag if the version does not match
            if !is_correct_version(&self.de.de_version, &version) {
                self.de.skip = true;
            }

            self.curr += 1;
            seed.deserialize(&mut *self.de).map(Some)
        }
    }
}

The final part of the puzzle is inside deserialize_option. If we are at a field not found in current data format the skip flag will be set here and we will produce None:

fn deserialize_option<V>(self, visitor: V) -> Result<V::Value, Self::Error>
    where
        V: Visitor<'de>,
    {
        if self.skip == true {
            self.skip = false;
            visitor.visit_none()
        } else {
            visitor.visit_some(self)
        }
    }

A lengthy solution but it works great for my usecase dealing with a lot of structs with lots of fields from different versions. Please do let me know how I can make this less verbose/better. I also implemented a derive macro (not shown here) for the Version trait to be able to do this:

#[derive(Debug, Clone, EventPrinter, Version)]
pub struct Position {
    x: f32,
    z: f32,
    y: f32,

    #[version([1, 13, 0, 0])]
    is_visible: Option<bool>,
}

With this derive macro, I find that this solution tends to scale well for my usecase.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.