Extract single field from xml using serde


#1

I’m trying to extract a list of items from an xml document using serde and serde-xml-rs. The xml has the form seen here. I would like to extract the <IdList> from that search result. Because it’s relatively simple, I’ve been successful turning the result into strings and matching a regex to get the information, but I’d like to figure out how to do this with serde. The examples I have found all point out how to extract field attributes rather than data: <Id num="12345"> vs <Id>12345</Id>. What I have tried is:

#[derive(Serialize, Deserialize, Debug)]
struct IdList {
    #[serde(rename="$value")]
   IdList: Vec<Id>
}

#[derive(Serialize, Deserialize, Debug)]
struct Id {
    #[serde(rename="$value")]
    id: String
}

This gives the error

thread ‘main’ panicked at ‘called Result::unwrap() on an Err value: Expected token XmlEvent::Characters(s), found StartElement(Id, {"": “”, “xml”: “http : / / w w w . w 3 .org/XML/1998/namespace”, “xmlns”: “http : / / w w w .w3.org/2000/xmlns/”})’

How should I be handling this?


#2

Your structs are fine, but you can’t feed the entire document to Serde and expect it to start deserializing when it encounters the opening <IdList>. You can read the entire document into a String, find the position of <IdList>, and deserialize the slice starting at that point; Serde will ignore everything after the closing </IdList>.

Proof of concept:

extern crate serde;
#[macro_use]
extern crate serde_derive;
extern crate serde_xml_rs as xml;

use std::env;
use std::error::Error;
use std::fs::File;
use std::io::Read;

#[derive(Deserialize, Debug)]
struct IdList {
    #[serde(rename="$value")]
    idlist: Vec<Id>,
}

#[derive(Deserialize, Debug)]
struct Id {
    #[serde(rename="$value")]
    id: String,
}

fn main() {
    let mut args = env::args();
    args.next().expect("progname");
    if let Some(param) = args.next() {
        match do_des(&param) {
            Ok(idlist) => println!("{:?}", idlist),
            Err(e) => eprintln!("{}", e),
        }
    }
}

fn do_des(filename: &str) -> Result<IdList, Box<Error>> {
    let mut doc = File::open(filename)?;
    let mut doc_str = String::new();
    doc.read_to_string(&mut doc_str)?;
    if let Some(idl_ix) = doc_str.find("<IdList>") {
        let idlist: IdList = xml::deserialize(doc_str[idl_ix..].as_bytes())?;
        Ok(idlist)
    } else {
        Err("no <IdList> in XML document")?
    }
}

#3

Alternatively, you can deserialize the entire XML string into an Element, which is a Rust representation of the XML document without assigning any meaning to it. You can then traverse that to find all Elements of tag "IdList", and then deserialize those into your final type.


#4

Thanks to both solutions.