Serde deserialize inherit default from parent

Consider I am creating a system to parse a widget tree configuration

Say I have some configuration like this

#[derive(Deserialize)]
pub struct Config {
    pub fg_color: String,
    pub bg_color: String,
    pub widget1: WidgetConfig,
    pub widget2: WidgetConfig,
}

#[derive(Deserialize)]
pub struct WidgetConfig {
    pub fg_color: String,
    pub bg_color: String,
    pub size: u32,
    // Other fields
}

Now when provided a toml configuration like this

fg_color = "black"
bg_color = "white"
[widget1]
fg_color = "blue"
bg_color = "red"
size = 24
[widget2]
size = 24

When deserializing this, Widget2 should realize that the colors were not specified so the global fg_color and bg_color should be used (black and white respectively).

Is there a nice and easy way to do this. I know I can use a separate struct for deserialization filled with optional values and then convert it into a "real struct" with non optional values. But this seems tedious and filled with boilerplate when I have to do this for multiple such parameter groups, where some options are required and some can just be inherited from the parent.

In this example fg_color and bg_color are inheritable, but size is not.

EDIT
The situation becomes even worse when there is further nesting involved and you have to inherit from your immediate parent and not a global ancestor.

Maybe it is good to use type_builder crate which can set default value if a value is not set.

You'd probably need to write a custom Deserializer if you wanted the defaults to be propagated by serde. A standard Deserializer doesn't know how your inheritance scheme works, and doesn't have a way to track the current parent in order to resolve defaults.

It will probably be easier to just have fields that should use a default be optional, and then you can build up your inheritance chain and resolve the actual values after deserialization.

If you have a lot of types and fields that need to do some kind of default resolution like this you could also consider writing a proc macro to help avoid boilerplate. (Though that can take up way more time to write than just dealing with the boilerplate depending on how many cases you need to handle)

serde has a DeserializeSeed trait specifically for passing state into a deserialize implementation. You can use this to pass your global config state into the deserialization of your widget configs for inheriting the global values. It requires manually implementing the traits yourself, which can be a bit intimidating, but it actually isn't as hard as it looks.

I've put together a Rust playground for your specific example here: Rust Playground. Note that you don't even have to include the global config in your final Config struct if you don't need it after deserialization. This code can also be easily expanded to more config fields by adding the fields to the structs in all the necessary places. It's a bit tedious, but I would say this is the "correct" way to do what you are wanting.

The high-level overview of the above example is: the Config's Deserialize implementation first deserializes the global values into a GlobalWidgetConfig struct. Then it uses that GlobalWidgetConfig as a seed for deserializing the rest of the WidgetConfigs. Each time a WidgetConfig is deserialized, any missing fields are populated from the GlobalWidgetConfig. If you run the code, you should see the correct expected result in the debug output for your toml configuration.

Hope that helps!

Edit: Just read the edit to your message about doing this for multiple layers of nested WidgetConfigs. You can modify what I did to immediately use the parent config. I separated the global state into a GlobalWidgetConfig because your example didn't have every field required to have a global value, but you could easily rework the code to just use a single WidgetConfig, as both the seed and the widget values. How you do this really depends on what you want your toml config schema to be.

3 Likes

Hi I really appreciate the example. But I have never implemented a custom serializer before so I have a few questions with your implementation if you don't mind

In the DeserializeSeed impl for GlobalWidgetConfig you enumerate the fields like this

#[derive(Deserialize)]
#[serde(field_identifier, rename_all = "snake_case")]
enum Field {
     FgColor,
     BgColor,
     Size,
}

But then you also list out the strings like this

const FIELDS: &[&str] = &[
    "fg_color", "bg_color", "size",
    // Other fields
];

Why is both required? Also I do not understand the connection with the enum at all.

How do we know that visit_map<V> has a map that has keys equivalent to the enum fields?

Basically

while let Some(key) = map.next_key()? {
    match key {
        Field::FgColor => {
            if fg_color.is_some() {
                return Err(de::Error::duplicate_field("fg_color"));
            }
            fg_color = Some(map.next_value()?);
        }
        Field::BgColor => {
            if bg_color.is_some() {
                return Err(de::Error::duplicate_field("bg_color"));
            }
            bg_color = Some(map.next_value()?);
        }
        Field::Size => {
            if size.is_some() {
                return Err(de::Error::duplicate_field("size"));
            }
            size = Some(map.next_value()?);
        } // Other fields
    }
}

How do we actually know that the key of the map corresponds to the enum fields?

And what does field_identifier attribute mean? I couldn't find any documentation on this?

I do see an example using this field_identifier attribute in Manually deserialize struct · Serde

But it is still not clear what's going on and how it relates to the map in the visit_map

There is an issue mentioning this missing documentation container attributes 'field_identifier' and 'variant_identifier' are not documented · Issue #1238 · serde-rs/serde · GitHub

Once I understand what it does, I will try to contribute a PR to document this

EDIT:

There does seem to be a PR that adds some docs to give an explanation on what it is for
https://github.com/serde-rs/serde-rs.github.io/pull/107

Yeah, I'm happy to address any questions you have! Like I said, manual implementations of Deserialize and friends can be intimidating.

The main thing to understand is that serde's deserialization API is written the way it is because it has to accommodate both a generically deserializable type (in our case, Config and GlobalWidgetConfig) and a generic deserializer (which, again, in our case is toml's Deserializer). The deserializable type needs to be able to tell the deserializer how to parse the data, and then the deserializer needs to be able to tell the deserializer what serde data type it can provide.

In the example code I provided above, our code tells the deserializer that it is expecting a struct, with the call to deserialize_struct(). There are other things you could tell it instead, using any of the Deserializer methods here.

The deserializer (remember, in this case, toml's deserializer, but there are other deserializers we could use too) then uses that information to parse from the input bytes you gave it. It then has to provide that parsed data back to our deserializable type. In order to do that, it needs a way to talk to our type and tell it what data it has and in what form it is. This "talking" is done using the Visitor trait. If you look at the docs of Visitor, you'll see it has a ton of different potential methods for different types that could be used by the deserializer. The deserializer will call one of those methods, and if we implemented that method when defining our visitor, then the visitor will return the deserialized value.

Note that we don't know exactly what method the deserializer will call on our visitor implementation. Usually, you'd want to implement every visitor value that makes sense to be compatible with the most amount of deserializer implementations. For example, in WidgetConfigVisitor's visitor implementation, I only implemented visit_map, because I know that's the method that toml's deserializer will call. But some deserializers will call other methods from their deserialize_struct method. In the Manually deserialize struct article you linked, they implement both visit_map and visit_seq. This is because some common deserializers, like the one provided by bincode, can't actually deserialize structs as maps because they can't parse keys from their serialized data. When you use #[derive(Deserialize)], you'll actually get multiple visitor methods derived. Also, if you look at serde's source code and check out the implementations for types in the standard library, you'll see multiple visitor methods implemented as well for this same reason.

So with that foundation of how serde's deserialization works in mind, I'll address your first question: why do we need an enum for the fields of the struct, while also providing a list of field names to the deserializer?

The answer is because the two serve completely different purposes. The enum implements Deserialize itself, calling the deserializer's deserialize_identifier method to deserialize the struct's field name as an identifier. In toml (and many other deserializers), structs are deserialized as maps, with the keys being the field names and the values being the field values. So when we call map.next_key(), we're telling the MapAccess to deserialize the key as our Field type we provided. Then we match on that field value to figure out what field we're deserializing.

The list of string names of the fields is actually part of the Deserializer::deserialize_struct() API. Some deserializers require a list of all field names in order to parse the serialized data correctly. I don't know that toml actually uses it, but I wrote a deserializer, msd, which uses the list to know when to stop deserializing a struct, because the data is flat and there wouldn't be a way to tell otherwise.

Let me know if any part of that doesn't make sense. Deserialization implementations have a lot of moving parts to them.

Now for your other question: what does field_identifier do? I actually got the field_identifier attribute from the same link you found it on. I used it instead of writing out the Field deserialize derivation manually. It seems you're right, it is not actually formally documented anywhere. But the basic gist is that it derives a Deserialize implementation for Field that calls deserialize_identifier(), providing a visitor that implements methods commonly used for identifiers (usually visit_str(), visit_bytes(), and other similar ones). In the "Manually deserialize struct" link you can see basically what the derive code produces. If you want to help push for formal documentation of that attribute, that would only help the ecosystem :slight_smile: Not sure why it seems to have been stuck for the past little while.

I should add, if you want to see what the various #[derive(Deserialize)] code expands to, you can use cargo expand to see the generated code. Rust playground also has the tool built-in, so you can even check it out on the example playground link I posted above.

2 Likes

Wow, thanks a lot for the incredibly detailed answer. I finally understood the way the normal deserialize process works now.

But I am still struggling to apply that to DeserializeSeed

When you write
impl<'de, 'a> DeserializeSeed<'de> for &'a GlobalWidgetConfig but you return WidgetConfig instead are you saying that you are trying to deserialize a GlobalWidgetConfig into a WidgetConfig?

You call Some(map.next_value_seed(&global_config)?) but what actually happens when you call this instead of just next_value()?

I tried to adapt your example to deserialize a Vector of WidgetConfigs instead of widget1 and widget2 using the single GlobalWidgetConfig as state like this

Rust Playground

But this gives me an error saying ?` operator cannot convert from `WidgetConfig` to `Vec<WidgetConfig>

Not quite. I'm saying that I'm trying to deserialize, from the deserializer, into a WidgetConfig using a GlobalWidgetConfig as a seed value. It's essentially the same as a regular Deserialize impl, except it has a state that it can use as a resource. So when I'm deserializing from the deserializer into WidgetConfig, I do everything like you normally would, except if a value isn't provided I can look at the GlobalWidgetConfig seed value and copy data from there.

When you call next_value(), you're asking it to use the Deserialize impl of a type to deserialize into the value. When you call next_value_seed(seed), you're asking it to use the DeserializeSeed impl of a type (call it T) to deserialize into the value specified by <T as DeserializeSeed>::Value.

Under the hood, calls to next_value() actually end up forwarding to next_value_seed(), but that's just an implementation detail. Essentially, it's doing basically the same thing except that it uses a seed value to pass state into deserialization.

You're on the right track. The main issue here is that you're assigning widgets to be an object of type Option<WidgetConfig> with the line:

widgets = Some(map.next_value_seed(&global_config)?);

but your struct creation here expects that type to be Option<Vec<WidgetConfig>>:

Ok(Config {
    global_config,
    widgets: widgets.ok_or_else(|| de::Error::missing_field("widgets"))?,
})

hence the error message. What you need to do is deserialize the value accompanying the Field::Widgets key as a Vec<WidgetConfig>, rather than just a single WidgetConfig.

I should note that the toml input

[[widgets]]
fg_color = \"blue\"
bg_color = \"red\"
size = 24
[[widgets]]
size = 24

will actually be deserialized as a key-value pair of a single identifier ("widgets") and a serde seq type, not as a bunch of key-value pairs of "widget" identifiers with serde struct types.

So you need to modify your DeserializeSeed implementation for GlobalWidgetConfig to tell the deserializer it expects to deserialize a seq type, and also change the Value associated type to Vec<WidgetConfig>. Then you just deserialize each value in the sequence as a WidgetConfig and output the result.

I would rewrite the DeserializeSeed impl like this:

impl<'de, 'a> DeserializeSeed<'de> for &'a GlobalWidgetConfig {
    type Value = Vec<WidgetConfig>;

    fn deserialize<D>(self, deserializer: D) -> Result<Self::Value, D::Error>
    where
        D: Deserializer<'de>,
    {
        struct WidgetConfigSeqVisitor<'a>(&'a GlobalWidgetConfig);

        impl<'de, 'a> Visitor<'de> for WidgetConfigSeqVisitor<'a> {
            type Value = Vec<WidgetConfig>;

            fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
                formatter.write_str("sequence of WidgetConfigs")
            }

            fn visit_seq<V>(self, mut seq: V) -> Result<Self::Value, V::Error>
            where
                V: SeqAccess<'de>,
            {
                // You can make this a bit more efficient using seq.size_hint()
                // and Vec::with_capacity().
                let mut result = Vec::new();
                while let Some(widget_config) =
                    seq.next_element_seed(WidgetConfigDeserializer(self.0))?
                {
                    result.push(widget_config);
                }
                Ok(result)
            }
        }

        deserializer.deserialize_seq(WidgetConfigSeqVisitor(self))
    }
}

You'll notice I refer to a new type, WidgetConfigDeserializer. You'll still need all of the old logic to actually deserialize a WidgetConfig using the GlobalWidgetConfig as a seed, but since I just wrote this new implementation on GlobalWidgetConfig outputting a Vec<WidgetConfig>, I need to define a separate type to have the DeserializeSeed<Value = WidgetConfig> implementation we used earlier. That ended up looking like this for me (note that it's the exact same logic as before, just implemented on a different type that simply wraps a reference to the GlobalWidgetConfig):

struct WidgetConfigDeserializer<'a>(&'a GlobalWidgetConfig);

// This deserializes using an already-existing `GlobalWidgetConfig` as a base.
impl<'de, 'a> DeserializeSeed<'de> for WidgetConfigDeserializer<'a> {
    type Value = WidgetConfig;

    fn deserialize<D>(self, deserializer: D) -> Result<Self::Value, D::Error>
    where
        D: Deserializer<'de>,
    {
        #[derive(Deserialize)]
        #[serde(field_identifier, rename_all = "snake_case")]
        enum Field {
            FgColor,
            BgColor,
            Size,
            // Other fields
        }

        struct WidgetConfigVisitor<'a>(&'a GlobalWidgetConfig);

        impl<'de, 'a> Visitor<'de> for WidgetConfigVisitor<'a> {
            type Value = WidgetConfig;

            fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
                formatter.write_str("struct WidgetConfig")
            }

            fn visit_map<V>(self, mut map: V) -> Result<Self::Value, V::Error>
            where
                V: MapAccess<'de>,
            {
                let mut fg_color = None;
                let mut bg_color = None;
                let mut size = None;
                // Other fields
                while let Some(key) = map.next_key()? {
                    match key {
                        Field::FgColor => {
                            if fg_color.is_some() {
                                return Err(de::Error::duplicate_field("fg_color"));
                            }
                            fg_color = Some(map.next_value()?);
                        }
                        Field::BgColor => {
                            if bg_color.is_some() {
                                return Err(de::Error::duplicate_field("bg_color"));
                            }
                            bg_color = Some(map.next_value()?);
                        }
                        Field::Size => {
                            if size.is_some() {
                                return Err(de::Error::duplicate_field("size"));
                            }
                            size = Some(map.next_value()?);
                        } // Other fields
                    }
                }
                Ok(WidgetConfig {
                    fg_color: fg_color
                        .or_else(|| self.0.fg_color.clone())
                        .ok_or_else(|| de::Error::missing_field("fg_color"))?,
                    bg_color: bg_color
                        .or_else(|| self.0.bg_color.clone())
                        .ok_or_else(|| de::Error::missing_field("bg_color"))?,
                    size: size
                        .or_else(|| self.0.size.clone())
                        .ok_or_else(|| de::Error::missing_field("size"))?,
                    // Other fields
                })
            }
        }

        const FIELDS: &[&str] = &[
            "fg_color", "bg_color", "size",
            // Other fields
        ];
        deserializer.deserialize_struct("WidgetConfig", FIELDS, WidgetConfigVisitor(self.0))
    }
}

So now everything else in your example should work, because the widgets = Some(map.next_value_seed(&global_config)?); line will now assign a value of type Option<Vec<WidgetConfig>> to widgets, which can then be populated into your Config struct. I have a working version of your example here: Rust Playground using the code I mentioned above. It should work with any number of widgets, like you'd expect.

Hope that helps. I tried to provide my reasoning instead of just dumping code at you, because manual Deserialize implementations tend to be very long (interestingly, manual Serialize implementations are not nearly as long). It's no wonder there has been so much effort put in to writing procedural macros to generate this code for us, and it's too bad this more complicated use case can't just be derived by #[derive(Deserialize)].

2 Likes

Thanks a lot for helping me understand DeserializeSeed. I really appreciate you taking the time to provide these long explanations.

But you are right. This is incredibly tedious for more complex use cases.

Consider an extension to the example where we have different kinds of Widget. Ideally you would make WidgetConfig an Enum containing different wigets and have a Vector of Enums in the config. You can abstract the inheritable style props to a separate struct Style and flatten it when deserializing.

eg:

enum WidgetConfig {
    Clock(ClockWidgetConfig),
    CPU(ClPUWidgetConfig),
}

pub struct Style {
    fg_color: String,
    bg_color: String,
}

pub struct OptionalStyle {
    fg_color: Option<String>,
    bg_color: Option<String>,
}


struct ClockWidgetConfig {
    style: Style,
    clock_fmt: String,
}

struct CPUWidgetConfig {
    style: Style,
    cpu_fmt: String,
}

But the problem is now you have to implement DeserializeSeed manually for each of the WidgetStructs and the Enum and the Vector of Enums as well.

It seems like there is no way to just say deserialize normally for the other fields and use the state for the style field to convert OptionalStyle to Style.

EDIT It would probably be less boiler plate and easier to have two versions of a Config struct for each Widget variant (one optional, another non optional) and then after deserializing the optional versions (We can just derive everywhere) build the non optional versions.