Skip invalid elements in a sequence with Serde deserialization

Using Serde, I would like to Deserialize a sequence of elements by keeping the valid elements and skip the invalid ones.

I have the following payload:

{
    "nhits": 30,
    "parameters": {
        "dataset": "occupation-parkings-temps-reel",
        "timezone": "UTC",
        "rows": 50,
        "start": 0,
        "format": "json",
        "facet": [
            "etat_descriptif"
        ]
    },
    "records": [
        {
            "datasetid": "occupation-parkings-temps-reel",
            "recordid": "1436c55a76fc7910b5a0336eb74cc0957870a8fd",
            "fields": {
                "nom_parking": "P1 Esplanade - Centre commercial",
                "etat": 1,
                "ident": 27,
                "infousager": "220",
                "idsurfs": "1703_DEP_27",
                "libre": 229,
                "total": 251,
                "etat_descriptif": "Ouvert"
            },
            "record_timestamp": "2020-12-20T12:51:00.704000+00:00"
        },
        {
            "datasetid": "occupation-parkings-temps-reel",
            "recordid": "2b15689c04478fcad8c964a5d9f3c0148eb70126",
            "fields": {
                "etat": 1,
                "ident": 30,
                "infousager": "LIBRE",
                "libre": 719,
                "total": 719,
                "etat_descriptif": "Ouvert"
            },
            "record_timestamp": "2020-12-20T12:51:00.704000+00:00"
        }
    ],
    "facet_groups": [
        {
            "facets": [
                {
                    "count": 28,
                    "path": "Ouvert",
                    "state": "displayed",
                    "name": "Ouvert"
                },
                {
                    "count": 1,
                    "path": "Ferm\u00e9",
                    "state": "displayed",
                    "name": "Ferm\u00e9"
                },
                {
                    "count": 1,
                    "path": "frequentation temps reel indisponible",
                    "state": "displayed",
                    "name": "frequentation temps reel indisponible"
                }
            ],
            "name": "etat_descriptif"
        }
    ]
}

I have a different strucs that correspond:

/// The container for the API response
#[derive(Debug, Deserialize)]
pub struct OpenDataResponse<T> {
    /// The parameters relative to the response
    pub parameters: Parameters,

    /// The parameters relative to the pagination
    #[serde(flatten)]
    pub pagination: Pagination,

    /// The sets of records inside the response
    #[serde(bound(deserialize = "T: Deserialize<'de>"))]
    #[serde(deserialize_with = "deserialize::failable_records")]
    pub records: Vec<Record<T>>,
}

/// A record represents an item of some data
/// with a specific id.
#[derive(Debug, Deserialize)]
pub struct Record<T> {
    /// The identifier of the record
    #[serde(rename(deserialize = "recordid"))]
    pub id: String,

    #[serde(rename(deserialize = "fields"))]
    pub(crate) inner: T,
}

#[derive(Debug, Deserialize)]
pub struct StatusOpenData {
    #[serde(rename(deserialize = "idsurfs"))]
    pub id: String,

    #[serde(rename(deserialize = "nom_parking"))]
    pub name: String,

    #[serde(rename(deserialize = "etat"))]
    pub status: i8,

    #[serde(rename(deserialize = "libre"))]
    pub free: u16,

    pub total: u16,

    #[serde(rename(deserialize = "etat_descriptif"))]
    pub users_info: Option<String>,
}

In regards to those definitions, a `StatusOpenData` element has some required fields. So in the
the `records` from the example, the first element is valid and the second is invalid.

I implemented my own deserialization method `deserialize::failable_records` as:

struct FailableDeserialize<T> {
    inner: Option<T>,
}

impl<'de, T: Deserialize<'de>> Deserialize<'de> for FailableDeserialize<T> {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        let value: Option<T> = Deserialize::deserialize(deserializer).ok();
        Ok(FailableDeserialize { inner: value })
    }
}

pub(super) fn failable_records<'de, D, T>(deserializer: D) -> Result<Vec<T>, D::Error>
where
    D: Deserializer<'de>,
    T: Deserialize<'de>,
{
    // Error returned from the line below
    let elements: Vec<FailableDeserialize<T>> = Deserialize::deserialize(deserializer)?;
    let result = elements.into_iter().filter_map(|f| f.inner).collect();
    Ok(result)
}

This failed with some error like: should take errors into account: Error("expected , or ] ",

I do not understand why the error is returned: let elements: Vec<FailableDeserialize<T>> = Deserialize::deserialize(deserializer)?; tries do deserialize a sequence of FailableDeserialize<T> elements but this type implements a Deserialize in a way he can not return an error.

Where am I wrong?

The problem with the FailableDeserialize implementation is, that it fails to consume all the tokens which belong to one "unit". For example, FailableDeserialize<u32> works for the JSON strings 1 (obviously), "", but fails for {}.
If you want to see how to properly implement this behaviour you can take a look at serde_with::DefaultOnError.

In the case of "" deserializing works, because the JSON deserializer will consume the whole string before calling the visit_* method. This means all token are consumed and the deserializer will next see the , in a list.
However, for {} the deserialization will fail after the {, thus the } bracket is the next token. The list visitor does not understand how to process it and then also fails.


Your Rust code snippets are unfortunately incomplete. For example, you are missing a definition of Parameters and you didn't mention what you use for T in OpenDataResponse<T>.

I can not share my complete code. But you provided me a really good hints with the default capabilities.

I used #[serde(default)] and use a custom deserialize method to skip the values that have been deserialised to their default values.

Is there any mention on serde.rs or inside the serde crate's documentation that visitors should handle those tokens (like , or {}) ?

The Visitor does not have any direct access to the individual tokens. The Visitor is completely agnostic to the concrete format. The Deserializer has the logic of the data format and how to process the tokens. The Visitor instructs the Deserializer what data types are expected.

1 Like