Quick-xml deserializing big files

I am trying to deserialize big XML files. I am using quick-xml because it seems the most popular option.
But I can find a way to easily stream and deserialize at the same time.

The data structure I am trying to deserialize is quite simple, it is a Vec<Thingy>. So the file looks like:

<bla>
  <thingy attr1=... attr2=... ... />
  <thingy attr1=... attr2=... ... />
  <thingy attr1=... attr2=... ... />
  ...
</bla>

If I use quick_xml::de::from_reader to deserialize I run out of memory. The final purpose is to fill up an sql database so I thought I could just stream the XML into thingies and for each Thingy, insert it into the database.

But this turns out to be more complicated than I thought. In order to stream something, it seems that I must use a API like read_event_into which seems way lower that I am willing to go. It would require way more work (and trouble, and maintenance, and bugs, etc) than I would like.

I was hoping that I could just get the string "<thingy attr1=... attr2=... ... />" from the event and then just call quick_xml::de::from_str on it. Seems a little ugly but I can live with that. Yet, if I understand the doc correctly, that does not seems to be an option.

Is there a way with quick_xml or another reasonably mature crate to stream Thingies easily here? Basically a SAX but not at the node level, but at the "object you are trying to deserialize" level.

If your XML structure is so predictable, you could just use Reader::from_str and then loop into the reader events:

loop {
    match reader.read_event() {
       ... // match on the events here
    }
}

My question is about having to avoid matching events. What you propose do not seem different from read_event_info.

My understading of this is you end up with:

loop {
  match reader.read_event() {
    Ok(Event::Start(e)) => {
      match e.name().as_ref() {
        b"row" => // do something with e.attributes()
      }
    }
  }
}

And then you basically have to deserialize yourself, sanitize data, ensuring all the fields are present, etc. In my case I have many files with many different types that I carefully described as serde data structure and I'd like to lean on this instead of having to deserialize everything manually.

Got it. I thought you still had to manually deserialize your data structures.

I'm not sure if quick-xml is still a good fit for your use case then.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.