Help with deserializing JSON to Option<Vec<f64>> containing null

Hello All,

I am trying to Serialize/Deserialize a structure of this type:

#[derive(Deserialize, Default, Clone, Debug, Serialize)]
struct Metrics {
    metric: String,
    #[serde(deserialize_with = "deserialize_vec", serialize_with = "serialize_vec")]
    data: Option<Vec<f64>>,
    metric_type: String,
}

with custom Serializer/Deserializer adapters.
The possible JSON that can come are of these 2 type:

let input_1 = r#"{
        "metric": "thread0",
        "data": [ 9.0, 12.0, 16.0, 2.0, null, 7.0, null, null, 8.0 ],
        "metric_type": "thread"
        }"#;
let input_2 = r#"{
            "metric": "thread1",
            "data": null,
            "metric_type": "thread"
            }"#;

where the data field itself can be null or the vec can contain null values. So far, I have managed to write custom Serializer/Deserializer when the data field is of type Vec<f64> only ( inspired from : Deserialize JSON array and replace null elements with zeros - #2 by nickelc ).
Also, I was able to write Serializer for data : Option<Vec<f64>> as follows:

fn serialize_vec<S>(
    to_serialize_vec_option: &Option<Vec<f64>>,
    serializer: S,
) -> Result<S::Ok, S::Error>
where
    S: Serializer,
{
    match *to_serialize_vec_option {
        Some(ref value) => {
            let mut temp = Vec::<Option<f64>>::new();
            for i in value {
                if i.is_nan() {
                    temp.push(None);
                } else {
                    temp.push(Some(*i));
                }
            }
            serializer.serialize_some(&temp)
        }
        None => serializer.serialize_none(),
    }
}

But I am facing difficulty writing Deserializer for data: Option<Vec<f64>> where either the Option can be null or the vector itself contains null values and I convert them as NAN while deserializing. Could anyone guide me how to write custom Deserializer function that can be used for Option<Vec<f64>> which can parse the given 2 input string ?

Many thanks in advance to the Rust Community.

Why don't you just

    #[serde(default)]
    data: Vec<Option<f64>>,

?

Because the program I am writing needs to keep a Vec<f64> data in memory for several days. There will be lots of data and using Vec<Option<f64>> usually consumes 2 times the amount of memory as Vec<f64>. Hence I do not want to use Vec<Option<f64>>.

Thank you for the suggestion, but it was a no brainer to try Vec<Option<f64>> in the first place, only to see the program consuming lot of memory. In short, I am writing a program to retain time-series data for a week, and then serialize it as backup. The deserialization happens when the data is pushed to the endpoints I have written.

Then just convert the DAO to the one you want to use via implementing the From trait.
Also, how do you want to handle the null values as f64?
Should they default to 0.0 or get thrown out?

1 Like

With Vec<f64>, I was successful with this Deserializer function :

fn deserialize_vec<'de, D>(deserializer: D) -> Result<Vec<f64>, D::Error>
where
    D: Deserializer<'de>,
{
    use serde::de::{SeqAccess, Visitor};

    struct SeqVisitor(PhantomData<f64>);

    impl<'de> Visitor<'de> for SeqVisitor {
        type Value = Vec<f64>;

        fn expecting(&self, fmt: &mut fmt::Formatter<'_>) -> Result<(), fmt::Error> {
            fmt.write_str("default vec")
        }

        fn visit_seq<A: SeqAccess<'de>>(self, mut seq: A) -> Result<Self::Value, A::Error> {
            let mut vec = Vec::<f64>::new();
            while let Ok(Some(elem)) = seq.next_element::<Option<f64>>() {
                if let Some(val) = elem {
                    vec.push(val);
                } else {
                    vec.push(f64::NAN);
                }
            }
            Ok(vec)
        }
    }
    deserializer.deserialize_seq(SeqVisitor(PhantomData))
}

I usually handle null's in vector as NAN.

1 Like
3 Likes

Alternative using a custom deserialization function for data requiring less allocation:

use serde::{
    de::{self, Deserializer, SeqAccess, Visitor},
    Deserialize,
};

#[derive(Deserialize, Default, Clone, Debug)]
struct Metrics {
    metric: String,
    #[serde(deserialize_with = "deserialize_vec")]
    #[serde(default)]
    data: Option<Vec<f64>>,
    metric_type: String,
}

fn deserialize_vec<'de, D>(deserializer: D) -> Result<Option<Vec<f64>>, D::Error>
where
    D: Deserializer<'de>,
{
    use std::fmt;

    struct V;

    impl<'de> Visitor<'de> for V {
        type Value = Option<Vec<f64>>;

        fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
            write!(formatter, "a nullable list with nullable numbers")
        }

        fn visit_none<E>(self) -> Result<Self::Value, E>
        where
            E: de::Error,
        {
            Ok(None)
        }

        fn visit_some<D>(self, deserializer: D) -> Result<Self::Value, D::Error>
        where
            D: Deserializer<'de>,
        {
            deserializer.deserialize_seq(self)
        }

        fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
        where
            A: SeqAccess<'de>,
        {
            let mut res = Vec::with_capacity(seq.size_hint().unwrap_or(0));

            while let Some(v) = seq.next_element::<Option<f64>>()? {
                let elem = match v {
                    Some(x) => x,
                    None => f64::NAN,
                };

                res.push(elem);
            }

            Ok(Some(res))
        }
    }

    deserializer.deserialize_option(V)
}

fn main() {
    let json1 = r#"{
        "metric": "thread0",
        "data": [ 9.0, 12.0, 16.0, 2.0, null, 7.0, null, null, 8.0 ],
        "metric_type": "thread"
    }"#;

    let metrics: Metrics = serde_json::from_str(json1).unwrap();

    // Didn't bother to make an assert work because NaN != NaN
    println!("{:?}", metrics.data);

    let json2 = r#"{
        "metric": "thread1",
        "data": null,
        "metric_type": "thread"
    }"#;

    let metrics: Metrics = serde_json::from_str(json2).unwrap();

    assert_eq!(metrics.data, None);
}

Playground.

4 Likes

I think that when we're serializing a struct holding a huge Vec into JSON, the allocation of f64 vs Option<f64> is the least of our problems. There is such as thing as premature optimization.

When converting one struct to another, we need to allocate two vectors though.

1 Like

Ahh, amazing !
Thank you very much. This is what I needed.

1 Like

Interesting approach. Learnt something new today ! =)