Announcing an XML SAX interface crate and parser implementation (xml_sax & xml_oxide)

Hi, Rust developers!

I have been implementing an XML SAX parser in pure Rust. It was my first serious Rust project so it was educational as well as enjoyable.

The interesting idea in my parser is that I have separated interface and implementation.

I work as an enterprise integration consultant. In the Java world where XML is a first-class citizen, there is one interface and many implementations for XML parsers. I think there is a big intersection between people who use XML daily and people exposed to Java. So I tried to get inspiration from the SAX interface. The decision is like rust's choice of being C-style. I think it will be easier for users with familiarity.

I have been able to parse a 3.5 GB XML file on an amazon micro instance with 1 GB ram and I think it is very fast(release) for common usage.(Benchmarks needed)

I'm open to your ideas about improving both Rust code and API. I'm also eager to chat and work with XML parser implementors on the SAX interface.

Beware that the libraries are not feature-complete yet.

Here is an example usage:
https://gist.github.com/fatihpense/e3ecc57026abdf3269466c7bf5739c94

SAX interface:
https://crates.io/crates/xml_sax
https://github.com/fatihpense/rust_xml_sax

SAX implementation:
https://crates.io/crates/xml_oxide

https://github.com/fatihpense/rust_xml_oxide

7 Likes

Updated to version v0.0.4 with attribute name&value support. You can use iterator with attributes! Next step should be namespace support. :slight_smile:

The library should work fine with well-formed documents. However, error handling is harder with my parser implementation. I think standardized error codes and messages would be great in the future.

I explore the validity of interface(xml_sax) with implementation(xml_oxide). I can change implementation when interface matures. I am always open to ideas.

Thanks for your support & interest!

https://github.com/fatihpense/rust_xml_oxide/commit/537a7c5c323e6e6824e2ff443108d8554e3a53f9#diff-162609764ad7ec6a263806dd88145c74R19

Hi Rustaceans!
Updated library to version v0.0.5

Changelog:

  1. Internal changes to the parser, SAX interface doesn't change so my unit tests are not affected. :grin:
  2. Minimum RAM usage (it removes chars from memory after tags pass).
  3. High CPU usage (this is not a feature! :roll_eyes: )

To-do list include:

  1. Making SAX namespace aware
  2. Ensuring SAX Interface is beautiful so that it gets community acceptance. I need help & opinions here!
  3. Write SAX adapter libraries for other XML parsers. (This is one benefit of interface library)

In my vision these libraries could provide code reuse across community if they get community acceptance.

(any xml parser implementation) => XML SAX Interface => XML DOM Interface(trait + objects?) => XSD,XSLT,WSDL,Mapping libraries/applications

This ensures applications/libraries can change underlying implementations easier.

Thank you for reading.
Fatih

2 Likes

Hi Rustaceans!
I have just updated library to version v0.0.6 and xml_sax interface library to v0.0.3

Changelog:

  1. Namespace support, finally!
  2. ContentHandler seems to be fully implemented. Maybe I can implement start/end prefix mapping but that is a nice-to-have for now. I'm open to ideas. Especially if you have ideas about interface library, the sooner we discuss the better :smile:

To-do list

  1. Write SAX adapter libraries for other XML parsers.
  2. Develop new backends using the nice parser libraries like nom & pest.
  3. More documentation
  4. Library cleaning & trying to apply best practices(learning from the community)

Thank you for reading.
Fatih