Fastest xml parser in rust

I am trying to parse 1GB XMLXML. [xml_oxide] xml_oxide - Rust claims it can parse 1GB XML in 23 seconds. However, when I am using the sample program provided by the library it took around 4 minutes.

I have an equivalent program written in c# that just took 20 secs to parse 600 MB XML.

XmlTextReader textReader = new XmlTextReader("C:\utils-rust\src\xml\source1.xml");
string arr = new string[3];
int ctr = 0;
while (textReader.Read())
{
switch (textReader.NodeType)
{
case XmlNodeType.Element:

                    {
					ctr++;
                            arr[0] = (ctr).ToString();
                            arr[1] = textReader.Name;
                            arr[2] = textReader.Value;
							 Debug.WriteLine((ctr).ToString()+textReader.Name+textReader.Value);
					}
					
					 break;

                case XmlNodeType.Text: //Display the text in each element.
                    break;
                case XmlNodeType.EndElement: //Display the end of the element.                       
                    break;
                default:
                    break;
			}
		}

Crates usually don't outright lie about their performance. A couple of questions arise:

  • Did you compile and run the benchmarked program in release mode?
  • Did you use buffering for reading the file?
  • Does the Rust program perform exactly the same operations as the C# equivalent? (eg., are you perhaps allocating/copying many strings by accident?)
2 Likes

This is a cross-post:

2 Likes

Ah, so apparently OP did not run the code in release mode, after all.

1 Like

I am new to rust.So trying to learn things.I did update my original post in reddit.I ran this in release mode and not seeing performance as I would expect when compared to my c# code.

Hi Mak_88. I have an experimental parser that might be able to help you?

I am getting speeds of about 150MB/s on my MacBook ARM laptop.

Can you send me the XML file in question and I'll do a little test for you??

I'll let you know how fast my parser works. If its bad or good, I'll tell you!

Can you give me a link to a dummy XML file (or that file if you like) of similar complexity and size?

BTW... 150MB/s sounds like a lot, but I did some profiling and about 90% of the time is being tied up in memory-management. My parser could acheive perhaps 1GB/s if I REALLY spent some time optimising this.

http://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/

So I created my own XML parser and also that parser can parse 2 other formats. So its a parser that can parse 3 formats. And I timed each variant. I ran it 3x per variant... so I'll just give the fastest time for each.

Parsing XML:
Parse: 4.64043s (147.324MB/s)
Parse: 4.88103s (140.062MB/s)
Parse: 3.88754s (175.855MB/s)

I also converted it into another format (Jeebox), that looks nicer. It actually parsed slower, but the file is smaller and easier to read.

Parse: 7.76379s (74.5807MB/s)
Parse: 5.56946s (103.965MB/s)
Parse: 7.24399s (79.9324MB/s)

I then Converted the XML to a jbin. This is a binary format. But it is simply containing the exact same data.

Parse: 1.57349s (325.113MB/s)
Parse: 1.68883s (302.909MB/s)
Parse: 1.5878s (322.183MB/s)

All 3 parses used the same parser.

Parse speeds are interesting. Firstly, my file-size is smaller for my "jbin" file, its 25% smaller. So the "equivalent" parse-speed is higher than it seems. Basically its parsing in around 1.5s on my computer, compared to around 3.8s on my computer.

So its over 2x faster to use this binary format, but the XML parser is quite decent in itself. What do you think? Is this speed useful for you?

Yes this would be helpful

what platform are you on?

my lib is made in C. So it would need wrapping into Rust.

Windows

Are you able to wrap a C lib into rust?

I don't actually write rust code. So I'd have to install it and learn it :smiley:

I know Rust can wrap C libs quite well.

Otherwise you should look into some faster XML libs.

I don't want to spend a week working to create something that someone might not use... and have it forgotten about by humanity. Thats why I'm asking if you can wrap my lib with rust yourself?

Yes I will do that

You have quite a few options: ‘xml’ search // Lib.rs

xml_oxide is significantly less popular than just of those, and is not especially well maintained from the looks of it.

I will also note it only claims:

Fast enough for most use cases. It can parse a 1GB XML file(in memory) around 23 seconds

Were you comparing parsing memory?

Wait do you mean "Yes I will try other XML Libs" or "Yes I can try wrapping your lib into rust"?

I think my lib isn't ready for Windows actually, its only for Mac/Linux.

You can definitely find some faster XML libs than the one you have. Also, most software tends to run faster on Linux, so you might get a large speed up by simply running your code on Linux.

rapid_xml - Rust is supposed to be fast? I would say give that a try.

If you want to switch to linux and wrap my code into Rust, it is available here: GitHub - gamblevore/speedie: Compiler for my language 'Speedie'. Beta-state. Has a great future ahead for it. but you'll probably need my help in creating the lib. My project creates a bunch of files, one of them is the parsing lib. The rest you won't need.

How do I read it from in memory & not from disk location

Use read in std::fs - Rust to get a Vec<u8> of the file's content in memory, start timing from there, then parse that data.

Overall this is sometimes faster for small amounts of data, but as the data gets big enough to care about it will generally end up being slower, so keep in mind this is just for comparing the time to what the crate claims. You're better off with buffering by wrapping a file in BufReader in std::io - Rust, if you're not already.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.