Deserialize large nested struct without Stack Overflow

Hi folks,

I'm trying to deserialise a very large JSON (from a single String) into a struct with hundreds of nested structs in an async program.

I'm still working on the struct by adding more and more structs and hit an issue with stack overflow.

I added some Boxes like this:

pub struct NestedStructWithOtherNestedStructs {
    pub a: Option<Box<Foo>>,
    pub b: Box<Bar>,
}

And that helped for some time, but then again, I started getting the stack overflow.

Could you please describe your strategy for something like this?

I tried to write a trait that works as a wrapper for deserialization into a Box, but that didn't help.

impl ...
    fn my_deser(inp: String) -> Result<Box<Self>, Error>
    where
        Self: serde::de::DeserializeOwned,
    {
        Ok(serde_json::from_slice(out)?)
    }

I experimented with running the deserialization in another thread:

tokio::task::spawn_blocking(move || ...).await

And that worked but I had to start Tokio with a large .thread_stack_size(32 * 1024 * 1024).

I'm not trying to save memory in this program, but I think this is wrong since I'm bumping the stack for all tokio threads just because of this single deserialization.

Any ideas welcomed ;-). Thank you!

Have you considered reading the json string in chunks, aka streaming json? There are crates like struson that can help. You could gradually initialise your struct on the heap this way.

1 Like

Thank you @jofas . I'd like to have an option to parse both JSONs and YAMLs. This would limit me only for JSONs. I'd prefer some option that would take a lot of memory and do the work without changing too much code :wink: .

I think Serde uses the stack. If the JSON is very deeply nested (more than 100 levels or so) then Serde might overflow the default stack size, and I can't think of a way to fix that without redesigning Serde in a fairly fundamental way...

Maybe the best thing is to use a single separate worker thread for this specific work. Something like:

use std::thread::JoinHandle;
use std::sync::mpsc;
use tokio::sync::oneshot;

struct WorkerThread {
    sender: mpsc::SyncSender<(String, oneshot::Sender)>,
    handle: JoinHandle<()>,
}

impl WorkerThread {
    fn new() -> Self {
        let (sender, mut receiver) = mpsc::sync_channel(8);
        let handle = Thread::builder()
            .stack_size(BIG_STACK)
            .spawn(move || {
                while let Some((s, answer_sender)) = receiver.recv() {
                    let _ = answer_sender.send(parse(s));
                }
            });
        Self { handle, sender }
    }

    /// Get the worker thread to parse `s`.
    async fn parse(s: String) -> Result<DeepNestedStruct> {
        let (answer_sender, answer_receiver) = oneshot::channel();
        self.sender.try_send((s, answer_sender))?;
        answer_receiver.await
    }
}
3 Likes

This is a common issue with recursive types. Have you tried something like this?

1 Like

The last release of serde-saphyr has the streaming reader (function read). If you read multiple documents, it should yield them from iterator one by one without loading all stream into RAM. But you need to partition into multiple documents for this to work.

1 Like

Thank you @yuriy0 . I tried stacker, but I still had the stack overflow.

Thank you @jorendorff . I took your idea and I wrote the following function that I believe is similar to your implementation but works for any function:

    pub async fn spawn_thread_and_wait<F, T>(f: F, stack: usize) -> T
    where
        F: FnOnce() -> T + Send + 'static,
        T: Send + 'static,
    {
        let (tx, mut rx) = mpsc::channel(1);
        thread::Builder::new().stack_size(stack).spawn(move || {
            let result = f();
            let _ = tx.blocking_send(result);
        }).unwrap();
        rx.recv().await.unwrap()
    }

(just example with unwrap - I plan to implement my own Error and return Result)

do you mean the serde_stacker crate? since you mentined async, did you consider the possibility that it's the synthetic Future of the async function, not serde::Deserialize, that overflows the stack?

if you have deep .await chain or large async functions, try box the future before await-ing it. (although in some cases, Box::new() may not be optimized well...)

I doubt that deserialization of such struct will take less that recommended 10-100 microseconds for the async poll. Thus you should better do this on a separate blocking thread.

I think at this point you would have to post your code and the callstack when it overflows. The stack overflow could be in any number of places - when actually deserializing, when Dropping the intermediate serde::Value (which will be deeply nested if the json is nested), or in some other unforseen place.