Feedback on parsing xml using xml-rs

I've written a simple program to parse an XML file into a vector and would appreciate some feedback. I'm most interested in knowing if this is the most rustian way to implement this. Can it be improved?

This doesn't really answer your question but..., eventually you'll be able to use serde to do this:

// Unstable stuff. You can also use a build script to do this on stable but hopefully we'll
// eventually be able to use stable custom derives.
#![feature(custom_derive, plugin, test)]
#![feature(custom_attribute)]
#![plugin(serde_macros)]

extern crate serde;
extern crate serde_xml;

use std::fs::File;
use std::io::{BufReader, Read};

#[derive(Deserialize)]
struct Book {
    title: String,
    author: String,
}

fn parse_books(filename: &str) -> Vec<Book> {
    let file = File::open(filename).unwrap();
    let file = BufReader::new(file);
    let books: Vec<Book> = serde_xml::de::from_iter(file.bytes()).unwrap();
    books
}

fn main() {
    let books = parse_books("test.xml");
    for book in books {
        println!("{} by {}", book.title, book.author);
    }
}

Unfortunately, this currently doesn't work as serde_xml's support for sequences is very limited.

Indeed not what I was looking for, but still very useful. I'd not heard of serde so thank-you. Would still love some feedback on the code I wrote.

Looks good! I would recommend adding some error handling to your parse_books function. Check out the Rust Book for details. So instead of returning Vec<Book> you could return something like Result<Vec<Book>, BookParseError> and unwrap that in your main function.

Then you can avoid calling unwrap in parse_books and replace your printlns with Err.

You'll have to be patient with error handling in Rust to start with, there can be a bit of ceremony in writing your own error types, but it's worth it.

1 Like

Thank-you. That's very useful feedback. I agree that writing custom error types is ceremonious. It's going to take a while before I fully grok it.

I'm trying the simple case of returning Result<Vec<Book>, String> instead of using my own error type, but I'm stuck with how to factor out the unwrap:

let book = books.last_mut().unwrap();

last_mut() returns an Option and I want to return a Result, so I suppose that ok_or is what I'm looking for; however, when there is a book, I want to set a value on one of its fields. I only want to return if there's an error here.

Here's a more robust variant that:

  1. Uses a state machine.
  2. Handles xml with unknown fields/tags.
  3. Handles errors.
  4. Tells the parser to simplify things for us (see the ParserConfig part).
extern crate xml;

use std::fs::File;
use std::io::BufReader;
use std::error::Error as StdError;

use xml::reader::{EventReader, XmlEvent, Error as XmlError};
use xml::ParserConfig;

#[derive(Default)]
struct Book {
    title: String,
    author: String,
}

#[derive(Eq, PartialEq, Debug)]
enum Error {
    NotABooks,
    NonTextInBookField,
    OnlyBooksInBooks,
    UnknownBookField,
    ExpectedEndOfDocument,
    Xml(XmlError),
}

impl From<XmlError> for Error {
    fn from(e: XmlError) -> Self {
        Error::Xml(e)
    }
}

impl std::fmt::Display for Error {
    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
        write!(f, "{}", self.description())
    }
}

impl StdError for Error {
    fn description(&self) -> &str {
        use Error::*;
        match *self {
            NotABooks => "not a list of books",
            NonTextInBookField => "book fields should only contain text",
            OnlyBooksInBooks => "only book tags are allowed in books",
            UnknownBookField => "books may only contain title and author fields",
            ExpectedEndOfDocument => "extra xml after book list",
            Xml(ref e) => e.description(),
        }
    }
    fn cause(&self) -> Option<&StdError> {
        use Error::*;
        match *self {
            Xml(ref e) => Some(e as &StdError),
            _ => None,
        }
    }
}

fn parse_books(filename: &str) -> Result<Vec<Book>, Error> {
    enum BookField {
        Title,
        Author,
    }
    enum State {
        Start,
        End,
        Books,
        Book {
            field: Option<BookField>,
            book: Book,
        },
    }
    let file = File::open(filename).unwrap();
    let file = BufReader::new(file);
    let parser = EventReader::new_with_config(file, ParserConfig {
        trim_whitespace: true,
        cdata_to_characters: true,
        ..Default::default()
    });

    let mut state = State::Start;
    let mut books = Vec::new();
    for e in parser {
        let e = try!(e);
        state = match state {
            State::Start => match e {
                // Always emitted...
                XmlEvent::StartDocument { .. }=> State::Start,
                // This is the actual beginning.
                XmlEvent::StartElement { ref name, .. } if name.local_name == "books" => {
                    State::Books
                },
                _ => return Err(Error::NotABooks),
            },
            State::End => match e {
                XmlEvent::EndDocument => return Ok(books),
                _ => return Err(Error::ExpectedEndOfDocument),
            },
            State::Books => match e {
                XmlEvent::EndElement { .. } => State::End,
                XmlEvent::StartElement { ref name, .. } if &*name.local_name == "book" => State::Book {
                    field: None,
                    book: Book::default(),
                },
                _ => return Err(Error::OnlyBooksInBooks),
            },
            State::Book { field: None, book } => match e {
                XmlEvent::StartElement { name, .. } => match &*name.local_name {
                    "title" => State::Book { field: Some(BookField::Title), book: book },
                    "author" => State::Book { field: Some(BookField::Author), book: book },
                    _ => return Err(Error::UnknownBookField),
                },
                XmlEvent::EndElement { .. } => {
                    books.push(book);
                    State::Books
                },
                _ => State::Book { field: None, book: book },
            },
            State::Book { field: Some(field), mut book } => match e {
                XmlEvent::Characters(s) => {
                    *(match field {
                        BookField::Title => &mut book.title,
                        BookField::Author => &mut book.author,
                    }) = s;
                    State::Book {
                        field: Some(field),
                        book: book,
                    }
                },
                XmlEvent::EndElement { .. } => State::Book { field: None, book: book },
                _ => return Err(Error::NonTextInBookField),
            },
        }
    }

    Ok(books)
}

fn main() {
    let books = parse_books("test.xml").unwrap();
    for book in books {
        println!("{} by {}", book.title, book.author);
    }
}
4 Likes

You can use and_then on your book while it's still an Option to use it only if the book is not none.

I have a quick Playground here: Rust Playground

When I worked in publishing The Wind in the Willows was the test book I always used :slightly_smiling:

1 Like

Very nice. There are a lot of unfamiliar rust concepts in there but that's a good thing. Nice to see how you make a custom error type since the book was a bit confusing for me.

Nice example, thank-you. and_then looks very useful, and it didn't occur to me to add mut to a closure parameter and change the value there. Thanks.