Elegant way to parse irregular XML file to struct?

Let's says I have an XML like this:

<groups>
    <group>
        <v name="first">John</v>
        <v name="last">Redcorn</v>
    </group>
    <group>
        <v name="first">Dale</v>
        <v name="last">Gribble</v>
    </group>
</groups>

and I would like to parse that into a Vec<Person>.

Right now I use sxd-xpath to navigate the XML with an xpath and I do something like:


if node.name == "first" {
    current.first = node.value;
} else if node.name == "last" {
    current.last = node.value;
}

but I would prefer if I could use the field's name directly. A bit like Serde does. And I would like to have something like #[serde(rename = "name")] in case I don't like some names in the XML file.

Would it make sense to use Serde for this?

Serde is most of the time the way to go for deserializing.
Why do you hesitate? Do you want the most performance or an elegant solution?

Maybe an example will help explain.

So with this code (with serde-xml-rs):

#[derive(Debug, Deserialize)]
struct Data {
    #[serde(rename = "group")]
    groups: Vec<Group>,
}

#[derive(Debug, Deserialize)]
struct Group {
    #[serde(rename = "v")]
    vs: Vec<V>,
}

#[derive(Debug, Deserialize)]
struct V {
    name: String,

    #[serde(rename = "$value")]
    value: String,
}

let data: Data = from_reader(data.as_bytes()).unwrap();

I get:

Data {
    groups: [
        Group {
            vs: [
                V {
                    name: "first",
                    value: "John",
                },
                V {
                    name: "last",
                    value: "Redcorn",
                },
            ],
        },
        Group {
            vs: [
                V {
                    name: "first",
                    value: "Dale",
                },
                V {
                    name: "last",
                    value: "Gribble",
                },
            ],
        },
    ],
}

but I want something like this:

[
    Person {
        first: "John",
        last: "Redcorn",
    },
    Person {
        first: "Dale",
        last: "Gribble",
    },
]

so I need to first parse the XML with serde-xml-rs and then do something like:

    let mut list = Vec::new();

    for group in data.groups.into_iter() {
        let mut person = Person::default();

        for v in group.vs {
            if v.name == "first" {
                person.first = v.value;
            } else if v.name == "last" {
                person.last = v.value;
            }
        }

        list.push(person);
    }

But I would prefer not using "first" and "last" as string. I would prefer to use:

struct Person {
    first: String,
    last: String,
}

and some function that would read the XML the right way and use the struct attributes' names (first and last) directly. Like Serde does. Maybe with reflection or something. Or maybe there's a way when using Serde to get the struct attributes names.

Ah I see. I agree this is cumbersome. This is probably how I would write it:

use serde::Deserialize;
use serde_xml_rs::from_reader;
use serde::de::Error;

#[derive(Debug, Deserialize)]
struct Data {
    #[serde(rename = "group")]
    groups: Vec<Group>,
}

#[derive(Debug)]
struct Group {
    first: String,
    last: String,
}
impl<'de> Deserialize<'de> for Group {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: serde::de::Deserializer<'de>
    {
        #[derive(Deserialize)]
        struct V {
            name: String,
            #[serde(rename = "$value")]
            value: String,
        }

        #[derive(Deserialize)]
        struct GroupHelper {
            #[serde(rename = "v")]
            vs: Vec<V>,
        }

        let helper = GroupHelper::deserialize(deserializer)?;
        let mut first = None;
        let mut last = None;
        for v in helper.vs {
            if v.name == "first" { first = Some(v.value); continue; }
            if v.name == "last" { last = Some(v.value); }
        }
        let first = first.ok_or_else(|| Error::missing_field("first"))?;
        let last = last.ok_or_else(|| Error::missing_field("last"))?;
        Ok(Group { first, last })
    }
}
1 Like

That's a lot better but wouldn't it be nice being able to read the struct's fields names?

A bit like the following but with a way to set the values:

macro_rules! my_macro {
    (struct $name:ident {
        $($field_name:ident: $field_type:ty,)*
    }) => {
        struct $name {
            $($field_name: $field_type,)*
        }

        impl $name {
            // This is purely an example—not a good one.
            fn get_field_names() -> Vec<&'static str> {
                vec![$(stringify!($field_name)),*]
            }
        }
    }
}

my_macro! {
    struct S {
        a: String,
        b: String,
    }
}

// S::get_field_names() == vec!["a", "b"]

https://stackoverflow.com/a/29986760

It turns out that enums can be used too:

#[derive(Deserialize)]
#[serde(tag = "name", content = "$value")]
enum V {
    first(String),
    last(String),
}

Based on this I came up with a macro:

macro_rules! define_xml_deserialize {
    ( impl Deserialize for $name:ident { $( $field:ident : $typ:ty ),* $(,)? } ) => {
        impl<'de> serde::Deserialize<'de> for $name {
            fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
            where
                D: serde::de::Deserializer<'de>
            {
                use serde::Deserialize;
                use serde::de::Error;

                #[derive(Deserialize)]
                #[serde(tag = "name", content = "$value")]
                #[allow(non_camel_case_types)]
                enum V {
                    $( $field ( $typ ) ),*
                }

                #[derive(Deserialize)]
                struct Helper {
                    v: Vec<V>,
                }

                let helper = Helper::deserialize(deserializer)?;
                $(
                    let mut $field: Option<$typ> = None;
                )*
                for v in helper.v {
                    match v {
                        $(
                            V::$field(val) => $field = Some(val),
                        )*
                    }
                }
                $(
                    let $field = $field.ok_or_else(||
                        Error::missing_field(stringify!($field)))?;
                )*
                Ok(Group { $( $field ),* })
            }
        }
    };
}

#[derive(Debug)]
struct Group {
    first: String,
    last: String,
}

define_xml_deserialize! {
    impl Deserialize for Group {
        first: String,
        last: String
    }
}

The enum allows us to have fields with different types.

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.