Stack overflow on debug mode, but not release. Can't figure out solution without increasing stack allocation for release

In full-moon, we currently have a bug where parse_first_of! allocates way too much stack space on debug mode.

macro_rules! parse_first_of {
    ($state:ident, {$($(@#[$meta:meta])? $parser:expr => $constructor:expr,)+}) => ({
        $(
            $(#[$meta])?
            match $parser.parse($state) {
                Ok((state, node)) => return Ok((state, $constructor(node))),
                Err(InternalAstError::NoMatch) => {},
                Err(other) => return Err(other),
            };
        )+

        Err(InternalAstError::NoMatch)
    });
}

You can see how it works in this code:

#[derive(Clone, Debug, Default, PartialEq)]
struct ParseStmt;
define_parser!(ParseStmt, Stmt<'a>, |_, state| parse_first_of!(state, {
    ParseAssignment => Stmt::Assignment,
    ParseFunctionCall => Stmt::FunctionCall,
    ParseDo => Stmt::Do,
    ParseWhile => Stmt::While,
    ParseRepeat => Stmt::Repeat,
    ParseIf => Stmt::If,
    ParseNumericFor => Stmt::NumericFor,
    ParseGenericFor => Stmt::GenericFor,
    ParseFunctionDeclaration => Stmt::FunctionDeclaration,
    ParseLocalFunction => Stmt::LocalFunction,
    ParseLocalAssignment => Stmt::LocalAssignment,
    @#[cfg(feature = "roblox")]
    ParseCompoundAssignment => Stmt::CompoundAssignment,
    @#[cfg(feature = "roblox")]
    ParseExportedTypeDeclaration => Stmt::ExportedTypeDeclaration,
    @#[cfg(feature = "roblox")]
    ParseTypeDeclaration => Stmt::TypeDeclaration,
    @#[cfg(feature = "lua52")]
    ParseGoto => Stmt::Goto,
    @#[cfg(feature = "lua52")]
    ParseLabel => Stmt::Label,
}));

Basically, it goes over every parser until one matches, then returns that value. The problem, however, is that in debug mode, it allocates a ridiculously large amount of stack space, seemingly to allocate every possible result from each parser.

Here is the disassembly of the output code for ParseStmt from cargo asm with all features enabled.

The important part here to check is:

 mov     eax, 202424
 call    __rust_probestack

202,424 bytes allocated to the stack!

In release mode, this same code disassembles to this:

This time:

 mov     eax, 16376
 call    __rust_probestack

...only 16,376 bytes are allocated.

From my understanding of the assembly output, what's happening is that every use of match in debug mode allocates new space for each possible case. The release mode, correctly, reuses the stack space.

I've tried a lot of ways of fixing this, first by splitting them into several different functions, which only seemed to exacerbate the problem. Same with having parse output into a mutable buffer (I thought this would only allocate the stack space for that buffer, then reuse it, but it didn't). The best solution I've found is changing the match to this:

$(#[$meta])?
{
    let parser_result = $parser.parse($state).map(|(state, node)| (state, $constructor(node)));
    if parser_result != Err(InternalAstError::NoMatch) {
        return parser_result;
    }
}

This allocates a lot less stack space in debug mode without increasing release stack allocations:

 mov     eax, 88120
 call    __rust_probestack

...but this is still an extremely large stack cost, and because this parser is recursive, more complex ASTs will still stack overflow, and it gets worse the more nodes we support, which we do often.

I've tried the "z" optimization stuff, but it doesn't seem to be of any help.

1 Like

I took a look at the sizes of the types involved with -Z print-type-sizes:

https://blog.mozilla.org/nnethercote/2018/11/09/how-to-get-the-size-of-rust-types-with-zprint-type-sizes/

The Stmt struct is very large (>2KB!), almost all of which is in multiple TokenReferences (152 bytes).

1 Like

I've written a recursive descent parser a couple of years ago for work, that at some point had to be be deployed to WASM. The WASM deployment requirement only came much later after the parser was already written.

The thing about WASM is, the WASM VMs only have about 0.5 MB of call stack space total, which is laughably small. But recursive descent parsing uses a lot of call stack space.

The solution for this problem? A total rewrite that uses a heap-allocated stack to offload most of the state that used to be stored in the call stack.

2 Likes

This is a really cool trick, thank you!

That's a long-standing bug Match expressions use O(n) stack space with n branches · Issue #34283 · rust-lang/rust · GitHub. Note the top comment -- this bug is one of the reasons why esbuild is written in go rather than rust.

Practically, I'd suggest either to do nothing, or to use stacker stacker — Rust implementation // Lib.rs in the debug.

5 Likes

Ouch! Thanks for the insight. stacker looks a bit spooky, does rust-analyzer use it or something like it?

EDIT: I didn't notice this was an official Rust-organization crate until now, that calms my nerves a bit.