I am trying to parse 1GB XMLXML. [xml_oxide] xml_oxide - Rust claims it can parse 1GB XML in 23 seconds. However, when I am using the sample program provided by the library it took around 4 minutes.
I have an equivalent program written in c# that just took 20 secs to parse 600 MB XML.
XmlTextReader textReader = new XmlTextReader("C:\utils-rust\src\xml\source1.xml");
string arr = new string[3];
int ctr = 0;
while (textReader.Read())
{
switch (textReader.NodeType)
{
case XmlNodeType.Element:
{
ctr++;
arr[0] = (ctr).ToString();
arr[1] = textReader.Name;
arr[2] = textReader.Value;
Debug.WriteLine((ctr).ToString()+textReader.Name+textReader.Value);
}
break;
case XmlNodeType.Text: //Display the text in each element.
break;
case XmlNodeType.EndElement: //Display the end of the element.
break;
default:
break;
}
}
I am new to rust.So trying to learn things.I did update my original post in reddit.I ran this in release mode and not seeing performance as I would expect when compared to my c# code.
Hi Mak_88. I have an experimental parser that might be able to help you?
I am getting speeds of about 150MB/s on my MacBook ARM laptop.
Can you send me the XML file in question and I'll do a little test for you??
I'll let you know how fast my parser works. If its bad or good, I'll tell you!
Can you give me a link to a dummy XML file (or that file if you like) of similar complexity and size?
BTW... 150MB/s sounds like a lot, but I did some profiling and about 90% of the time is being tied up in memory-management. My parser could acheive perhaps 1GB/s if I REALLY spent some time optimising this.
So I created my own XML parser and also that parser can parse 2 other formats. So its a parser that can parse 3 formats. And I timed each variant. I ran it 3x per variant... so I'll just give the fastest time for each.
Parse speeds are interesting. Firstly, my file-size is smaller for my "jbin" file, its 25% smaller. So the "equivalent" parse-speed is higher than it seems. Basically its parsing in around 1.5s on my computer, compared to around 3.8s on my computer.
So its over 2x faster to use this binary format, but the XML parser is quite decent in itself. What do you think? Is this speed useful for you?
I don't actually write rust code. So I'd have to install it and learn it
I know Rust can wrap C libs quite well.
Otherwise you should look into some faster XML libs.
I don't want to spend a week working to create something that someone might not use... and have it forgotten about by humanity. Thats why I'm asking if you can wrap my lib with rust yourself?
Wait do you mean "Yes I will try other XML Libs" or "Yes I can try wrapping your lib into rust"?
I think my lib isn't ready for Windows actually, its only for Mac/Linux.
You can definitely find some faster XML libs than the one you have. Also, most software tends to run faster on Linux, so you might get a large speed up by simply running your code on Linux.
rapid_xml - Rust is supposed to be fast? I would say give that a try.
Use read in std::fs - Rust to get a Vec<u8> of the file's content in memory, start timing from there, then parse that data.
Overall this is sometimes faster for small amounts of data, but as the data gets big enough to care about it will generally end up being slower, so keep in mind this is just for comparing the time to what the crate claims. You're better off with buffering by wrapping a file in BufReader in std::io - Rust, if you're not already.