Help (de-)serializing weird JSON structure

Hi everyone,

I am pretty new to Rust, but been developing software for nearly 30 years in other languages. My current task is to support (de-)serialization of a model for a given json structure.

The model should look like this:

pub struct Datagram {
  pub header: Option<Header>
  pub payload: Option<Payload>
}

But the incoming json data structure expects the model to be

pub struct Datagram {
  pub header: Option<Vec<Header>>
  pub payload; Option<Vec<Payload>>
}

which is basically unusable. The json data looks like this:

{"datagram":[{"header":[...]},{"payload":[...]}]}

The background is that the model is defined via an XSD in this format:

    <xs:complexType name="DatagramType">
        <xs:sequence>
            <xs:element ref="ns_p:header"/>
            <xs:element ref="ns_p:payload"/>
        </xs:sequence>
    </xs:complexType>

The approach I am thinking about is that each struct with such a situation requires a custom (de-)serializater. Does anyone maybe have a hint for me where I should be looking for? Does serde or another serde crate already provide support in some way to achieve this?

One trick I've used with great success is to manually implement deserialize/serialize via a temporary struct.

When serializing, we can use map() and std::slice::from_ref() to let us go from a Option<&T> to a Option<&[T]> (a reference to one item is always a valid reference to a slice of 1 item).

impl Serialize for Datagram {
    fn serialize<S: Serializer>(&self, ser: S) -> Result<S::Ok, S::Error> {
        #[derive(Serialize)]
        struct Repr<'a> {
            header: Option<&'a [Header]>,
            payload: Option<&'a [Payload]>,
        }

        let repr = Repr {
            header: self.header.as_ref().map(std::slice::from_ref),
            payload: self.payload.as_ref().map(std::slice::from_ref),
        };

        repr.serialize(ser)
    }
}

The Deserialize implementation is a bit longer because you need to handle the possibility of multiple headers/payloads, but it's the same general idea.

impl<'de> Deserialize<'de> for Datagram {
    fn deserialize<D: Deserializer<'de>>(de: D) -> Result<Self, D::Error> {
        #[derive(Deserialize)]
        struct Repr {
            header: Option<Vec<Header>>,
            payload: Option<Vec<Payload>>,
        }

        let Repr { header, payload } = Repr::deserialize(de)?;
        
        let header = match header {
            Some(mut headers) if headers.len() == 0 => Some(headers.remove(0)),
            Some(_) => todo!("Figure out how you want to handle multiple headers"),
            None => None,
        };

        let payload = match payload {
            Some(mut payloads) if payloads.len() == 0 => Some(payloads.remove(0)),
            Some(_) => todo!("Figure out how you want to handle multiple payloads"),
            None => None,
        };

        Ok(Datagram { header, payload })
    }
}

(playground)

1 Like

Thank @Michael-F-Bryan

I should have mentioned that there are literally dozens of structs that would need this :frowning: So this approach sadly feels quite cumbersome as each serialize and deserialize implementation requires struct specific code.

Would there maybe a way to generalize this? Maybe using key-values and check if the struct contains a property identical to a key? Sadly I am way too new to the language :frowning:

Is the problem just that certain fields are serialized as an array with a single object instead of using the object?

If so, you can use the #[serde(with = "some::module")] syntax to tell serde, "when deserializing this field, use some::module::serialize() and some::module::deserialize().

#[derive(Serialize, Deserialize)]
pub struct Datagram {
    #[serde(with = "single_element_array")]
    pub header: Option<Header>,
    #[serde(with = "single_element_array")]
    pub payload: Option<Payload>,
}

// common module that can deserialize any type, T, via a single-element array.
mod single_element_array {
    use serde::{Deserialize, Deserializer, Serialize, Serializer};

    fn serialize<T, S>(value: &T, ser: S) -> Result<S::Ok, S::Error>
    where
        T: Serialize,
        S: Serializer,
    {
        let repr: &[T] = std::slice::from_ref(repr);
        repr.serialize(ser)
    }

    fn deserialize<'de, T, D>(de: D) -> Result<T, D::Error>
    where
        T: Deserialize<'de>,
        D: Deserializer<'de>,
    {
        let repr = <Option<[T; 1]>>::deserialize(de)?;

        repr.map(|[value]| value)
    }
}

(playground)

You still need to add the #[serde(with = "single_element_array")] attribute to each of these funny fields, but because single_element_array::serialize() and single_element_array::deserialize() are generic you won't need to duplicate the code.

3 Likes

It is not just single fields, it is the whole struct. So the struct should be serialized as an array with every field being using as an element in the array with the field name as the key.

So with the example struct:

pub struct Datagram {
  pub header: Option<Header>
  pub payload: Option<Payload>
}

and a instance of the struct

let datagram: Datagram = ...

the serialization would result in datagram actually being an array with 2 items, the first one contains datagram.header and the second one contains datagram.paylod:

{"datagram":[{"header":...},{"payload":...}]}

Hope this helps.

Ah okay, so instead of using an object with key-value pairs (e.g. {"header": ..., "payload": ...}) they store it as an array where each element is a {"key": "value"} pair?

I feel like you should be able to define your own serde::Deserializer where the deserialize_struct() method will know how to handle the arrays... But that sounds like a lot of work.

If you are okay with some runtime overhead, you could always load the document into a loosely-typed serde_json::Value then transform it to be more like a normal object. From there, serde_json::from_value() lets you deserialize the Value to your type.

Ah okay, so instead of using an object with key-value pairs (e.g. {"header": ..., "payload": ...} ) they store it as an array where each element is a {"key": "value"} pair?

Correct.

Runtime overhead shouldn't be an issue. And if it would be, that can be dealt with later. Right now it would be good to have something working and then go from there.

I am not sure I understand what you mean by

then transform it to be more like a normal object.

One thought I had was to deserialize the array and then process over each item and check wether the key is part of the struct and call deserialization of that element. And when serializing take every public element in the struct and put it into an array and then serialize the array.

Probably have to dive more into serde how it works and do that manually.

Thank you so much for the help. I got it working :slight_smile:

Avoided a custom deserializer but instead used post-processing. Serialize the model with serde, then deserialize it to serde_json::Value and walk through it to rearrange the objects into arrays and serialize it again. The performance hit shouldn't be a problem for the start. If it will be, this can still be changed :slight_smile:

1 Like