Serde: (De)Serializing an enum as if it was a map

Hi, dumb question. It's quite possibly answered elsewhere, but I can't find anything.

I have yaml files like so:

point-list:
  - ra: 0.0
    dec: 1.0
    comp_type: point
    flux_type:
      list:
        - freq: 150e6
          i: 1.0
point-power-law:
  - ra: 1.0
    dec: 2.0
    comp_type: point
    flux_type:
      power_law:
        si: -0.8
        fd:
          freq: 150e6
          i: 1.0

The top-levels are Sources, which are vectors of SourceComponent, which is:

#[derive(Clone, Debug, PartialEq, Deserialize, Serialize)]
pub struct SourceComponent {
    /// Coordinates struct associated with the component.
    pub radec: RADec,
    /// The type of component.
    pub comp_type: ComponentType,
    /// The flux densities associated with this component.
    pub flux_type: FluxDensityType,
}

FluxDensityType is an enum with two variants visible in the yaml ("list" and "power_law"). However, when deserializing, I don't know how to handle "flux_type" as the marker of "this is actually a FluxDensityType (and vice versa when serializing). This feels like a simple thing but I can't work it out. It doesn't look like the serde_with package helps either. Please help! TIA.

As far as I can tell there's no serde attribute that will do what you want.

Assuming the yaml format is set and you're free to change the Rust types, I think the easiest thing to do would be to restructure the Rust types a little. Something like below should be able to de/serialize the yaml you gave

#[derive(Clone, Debug, PartialEq, Deserialize, Serialize)]
struct SourceComponent {
    flux_type: FluxDensityType,
    // ....
}

#[derive(Clone, Debug, PartialEq, Deserialize, Serialize)]
enum FluxDensityType {
    List { list: Vec<Item> },
    PowerLaw { power_law: PowerLaw },
}

#[derive(Clone, Debug, PartialEq, Deserialize, Serialize)]
struct Item {
    i: f64,
    // ....
}

#[derive(Clone, Debug, PartialEq, Deserialize, Serialize)]
struct PowerLaw {
    si: f64,
    // ....
}

If this is inconvenient to work with on the Rust side, you could have two types, one which matches the yaml, say FluxDensityTypeYaml and implement From<FluxDensityTypeYaml> for FluxDensityType and use the from serde attribute.

You could also write a custom deserializer function for FluxDensityType and use the deserialize_with attribute, but this is a little more work.

2 Likes

Thanks @Heliozoa, that's a lot of useful pathways. I was hoping there was some serde magic I'd missed, but alas.

The schema layout is my fault, so I'll weigh up changing the schema with my colleagues tomorrow. I dare say it's worth it because this isn't a language-specific issue.

As for my Rust code, I just realised that this is complicated by some unit conversion and validation. I'm a fan of "parse, don't validate", so I'll think harder about how I'd like to proceed with this. Cheers!

I'm probably reading something wrong, but isn't the YAML structure in question for the flux_type field just the default, externally-tagged, enum representation? What's wrong with this playground, which uses the YAML as-is and simply defines a bunch of types to deserialize it into?

3 Likes

Oh, when looking at the problem I tried serializing the struct and on serde_yaml = "0.9" it outputs

point-list:
- ra: 0.0
  dec: 1.0
  comp_type: point
  flux_type: !list
  - freq: 150000000.0
    i: 1.0
point-power-law:
- ra: 1.0
  dec: 2.0
  comp_type: point
  flux_type: !power_law
    si: -0.8
    fd:
      freq: 150000000.0
      i: 1.0

which I assumed was different from the one in the OP, but I guess they are equivalent, at least as far as serde_yaml's de/serialization logic is concerned. On 0.8 (which is used in the playground), it outputs the same yaml as in the OP.

If it's not a problem that it doesn't output exactly the same string as in the OP, then this is definitely the way to go, I think. Alternatively, maybe the schema can be changed to use this !variant syntax (apparently called a tag). I haven't seen it before but looks like it's actually a pretty nice way to differentiate between Rust enum variants.

Sorry for the late reply, I just got back from a month-long break. I just wanted to mention that I got the types working with both of your help; thanks! I didn't end up using YAML tags and kept the schema as is.

I set out to do this work in the hopes that the YAML deserialisation would go much faster; testing shows it's about 7%. Not nothing, but I guess YAML just isn't great for that. The same JSON is much faster.

For posterity, the actual types are below.

#[derive(Debug, Clone, Default, Serialize, Deserialize)]
#[serde(transparent)]
pub(crate) struct SourceList(IndexMap<String, Source>);

#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
#[serde(transparent)]
pub(crate) struct Source {
    #[serde(with = "serde_yaml::with::singleton_map_recursive")]
    pub(crate) components: Vec1<SourceComponent>,
}

#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
pub(crate) struct SourceComponent {
    #[serde(flatten)]
    pub(crate) radec: RADec,

    pub(crate) comp_type: ComponentType,

    pub(crate) flux_type: FluxDensityType,
}

#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
#[serde(rename_all = "lowercase")]
pub(crate) enum ComponentType {
    Point,

    Gaussian {
        #[serde(serialize_with = "radians_to_arcsecs")]
        #[serde(deserialize_with = "arcsecs_to_radians")]
        maj: f64,

        #[serde(serialize_with = "radians_to_arcsecs")]
        #[serde(deserialize_with = "arcsecs_to_radians")]
        min: f64,

        #[serde(serialize_with = "radians_to_degrees")]
        #[serde(deserialize_with = "degrees_to_radians")]
        pa: f64,
    },

    #[serde(rename = "shapelet")]
    Shapelet {
        #[serde(serialize_with = "radians_to_arcsecs")]
        #[serde(deserialize_with = "arcsecs_to_radians")]
        maj: f64,

        #[serde(serialize_with = "radians_to_arcsecs")]
        #[serde(deserialize_with = "arcsecs_to_radians")]
        min: f64,

        #[serde(serialize_with = "radians_to_degrees")]
        #[serde(deserialize_with = "degrees_to_radians")]
        pa: f64,

        coeffs: Vec<ShapeletCoeff>,
    },
}

#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub(crate) enum FluxDensityType {
    PowerLaw {
        si: f64,
        fd: FluxDensity,
    },

    CurvedPowerLaw {
        si: f64,
        fd: FluxDensity,
        q: f64,
    },

    List(Vec1<FluxDensity>),
}
1 Like

I guess the differences between text based formats are relatively small performance-wise since they all work quite similarly. If you're willing to sacrifice human read- and writability and having a self-describing format for speed, you could give a crate like bincode - Rust a try.

Thanks for the tip. I'll consider it if it becomes a great enough issue.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.