Feedback on parsing xml using xml-rs


#1

I’ve written a simple program to parse an XML file into a vector and would appreciate some feedback. I’m most interested in knowing if this is the most rustian way to implement this. Can it be improved?


#2

This doesn’t really answer your question but…, eventually you’ll be able to use serde to do this:

// Unstable stuff. You can also use a build script to do this on stable but hopefully we'll
// eventually be able to use stable custom derives.
#![feature(custom_derive, plugin, test)]
#![feature(custom_attribute)]
#![plugin(serde_macros)]

extern crate serde;
extern crate serde_xml;

use std::fs::File;
use std::io::{BufReader, Read};

#[derive(Deserialize)]
struct Book {
    title: String,
    author: String,
}

fn parse_books(filename: &str) -> Vec<Book> {
    let file = File::open(filename).unwrap();
    let file = BufReader::new(file);
    let books: Vec<Book> = serde_xml::de::from_iter(file.bytes()).unwrap();
    books
}

fn main() {
    let books = parse_books("test.xml");
    for book in books {
        println!("{} by {}", book.title, book.author);
    }
}

Unfortunately, this currently doesn’t work as serde_xml's support for sequences is very limited.


#3

Indeed not what I was looking for, but still very useful. I’d not heard of serde so thank-you. Would still love some feedback on the code I wrote.


#4

Looks good! I would recommend adding some error handling to your parse_books function. Check out the Rust Book for details. So instead of returning Vec<Book> you could return something like Result<Vec<Book>, BookParseError> and unwrap that in your main function.

Then you can avoid calling unwrap in parse_books and replace your printlns with Err.

You’ll have to be patient with error handling in Rust to start with, there can be a bit of ceremony in writing your own error types, but it’s worth it.


#5

Thank-you. That’s very useful feedback. I agree that writing custom error types is ceremonious. It’s going to take a while before I fully grok it.

I’m trying the simple case of returning Result<Vec<Book>, String> instead of using my own error type, but I’m stuck with how to factor out the unwrap:

let book = books.last_mut().unwrap();

last_mut() returns an Option and I want to return a Result, so I suppose that ok_or is what I’m looking for; however, when there is a book, I want to set a value on one of its fields. I only want to return if there’s an error here.


#6

Here’s a more robust variant that:

  1. Uses a state machine.
  2. Handles xml with unknown fields/tags.
  3. Handles errors.
  4. Tells the parser to simplify things for us (see the ParserConfig part).
extern crate xml;

use std::fs::File;
use std::io::BufReader;
use std::error::Error as StdError;

use xml::reader::{EventReader, XmlEvent, Error as XmlError};
use xml::ParserConfig;

#[derive(Default)]
struct Book {
    title: String,
    author: String,
}

#[derive(Eq, PartialEq, Debug)]
enum Error {
    NotABooks,
    NonTextInBookField,
    OnlyBooksInBooks,
    UnknownBookField,
    ExpectedEndOfDocument,
    Xml(XmlError),
}

impl From<XmlError> for Error {
    fn from(e: XmlError) -> Self {
        Error::Xml(e)
    }
}

impl std::fmt::Display for Error {
    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
        write!(f, "{}", self.description())
    }
}

impl StdError for Error {
    fn description(&self) -> &str {
        use Error::*;
        match *self {
            NotABooks => "not a list of books",
            NonTextInBookField => "book fields should only contain text",
            OnlyBooksInBooks => "only book tags are allowed in books",
            UnknownBookField => "books may only contain title and author fields",
            ExpectedEndOfDocument => "extra xml after book list",
            Xml(ref e) => e.description(),
        }
    }
    fn cause(&self) -> Option<&StdError> {
        use Error::*;
        match *self {
            Xml(ref e) => Some(e as &StdError),
            _ => None,
        }
    }
}

fn parse_books(filename: &str) -> Result<Vec<Book>, Error> {
    enum BookField {
        Title,
        Author,
    }
    enum State {
        Start,
        End,
        Books,
        Book {
            field: Option<BookField>,
            book: Book,
        },
    }
    let file = File::open(filename).unwrap();
    let file = BufReader::new(file);
    let parser = EventReader::new_with_config(file, ParserConfig {
        trim_whitespace: true,
        cdata_to_characters: true,
        ..Default::default()
    });

    let mut state = State::Start;
    let mut books = Vec::new();
    for e in parser {
        let e = try!(e);
        state = match state {
            State::Start => match e {
                // Always emitted...
                XmlEvent::StartDocument { .. }=> State::Start,
                // This is the actual beginning.
                XmlEvent::StartElement { ref name, .. } if name.local_name == "books" => {
                    State::Books
                },
                _ => return Err(Error::NotABooks),
            },
            State::End => match e {
                XmlEvent::EndDocument => return Ok(books),
                _ => return Err(Error::ExpectedEndOfDocument),
            },
            State::Books => match e {
                XmlEvent::EndElement { .. } => State::End,
                XmlEvent::StartElement { ref name, .. } if &*name.local_name == "book" => State::Book {
                    field: None,
                    book: Book::default(),
                },
                _ => return Err(Error::OnlyBooksInBooks),
            },
            State::Book { field: None, book } => match e {
                XmlEvent::StartElement { name, .. } => match &*name.local_name {
                    "title" => State::Book { field: Some(BookField::Title), book: book },
                    "author" => State::Book { field: Some(BookField::Author), book: book },
                    _ => return Err(Error::UnknownBookField),
                },
                XmlEvent::EndElement { .. } => {
                    books.push(book);
                    State::Books
                },
                _ => State::Book { field: None, book: book },
            },
            State::Book { field: Some(field), mut book } => match e {
                XmlEvent::Characters(s) => {
                    *(match field {
                        BookField::Title => &mut book.title,
                        BookField::Author => &mut book.author,
                    }) = s;
                    State::Book {
                        field: Some(field),
                        book: book,
                    }
                },
                XmlEvent::EndElement { .. } => State::Book { field: None, book: book },
                _ => return Err(Error::NonTextInBookField),
            },
        }
    }

    Ok(books)
}

fn main() {
    let books = parse_books("test.xml").unwrap();
    for book in books {
        println!("{} by {}", book.title, book.author);
    }
}

#7

You can use and_then on your book while it’s still an Option to use it only if the book is not none.

I have a quick Playground here: http://is.gd/Fg7OFe

When I worked in publishing The Wind in the Willows was the test book I always used :slightly_smiling:


#8

Very nice. There are a lot of unfamiliar rust concepts in there but that’s a good thing. Nice to see how you make a custom error type since the book was a bit confusing for me.


#9

Nice example, thank-you. and_then looks very useful, and it didn’t occur to me to add mut to a closure parameter and change the value there. Thanks.