Cargo release took >8h while debug finishes in 25s

I recently implemented some optimizations in my parser combinator library, lip, aimed at reducing excessive box allocations. The technique was inspired by the nom library. All previous tests appear to function properly in both debug and release mode.

I then updated the version of the lip library in another project that depends on it and performed some minor code updates. However, post-update, I've encountered a problem: when I execute cargo build --release, the process seems to get stuck. Even after leaving it running overnight, it took over 8 hours to complete.

This issue was replicated both on stable and nightly Rust on macOS Ventura 13.4.1 (M1) and on stable Rust on Ubuntu 22.04 (ARM) running in a VM. Running cargo clean and clearing the ~/.cargo cache before build had no effect.

The specific changes made to the parser combinator library are documented here. You can inspect the File Changes on lib.rs for more details.

For anyone interested in replicating the issue, you can clone the project and try to compile it as follows:

git clone https://github.com/AlienKevin/wordshk-tools
cargo build --release # stuck

You can also build the tests of the lip parser to see that it doesn't get stuck:

git clone https://github.com/AlienKevin/lip
cargo test --release # finishes very fast

Any suggestions or insights about why the build time has increased so drastically would be greatly appreciated.

It could be a compiler bug, where some specific circumstances lead to exponential compile times. You could look through issues with the word "exponential" in the Rust repo for similar cases and/or make an issue of your own, preferably with a more minimal reproduction: Search ¡ repo:rust-lang/rust exponential ¡ GitHub

From brief testing it seems that the nested sequence calls in parse.rs are the problem. Given the many generics the function defines, nesting them probably leads to some large and complicated types, which are often seen in code with excessive compile times.

4 Likes

Thanks for taking the time to reply! I will definitely look into the sequence function. Just curious, how did you trace the issue to that call site? Did you run some profiling tool?

Since all the lip-related code in wordshk-tools was in parse.rs, I just tried removing/changing bits of code I thought looked like they might be the issue and seeing if it compiled quickly after the changes.

I'm not that knowledgeable when it comes to using a profiler to debug compiler-related issues, but When rustc explodes is one article about it if you're curious

4 Likes

I profiled my project and the log was filled with pages of calls to rustc_trait_selection::traits::project::normalize_with_depth_to, similar to issue mentioned in the "When rustc explodes" article you referenced. I'll try some of the tricks he mentioned in the article to see if it helps. Here's sample from the log:

┐rustc_trait_selection::traits::project::normalize_with_depth_to depth=9, value=<lip::FnParser<_, _, _> as lip::Parser<'_>>
┘
┐rustc_trait_selection::traits::project::normalize_with_depth_to depth=9, value=Binder { value: ProjectionPredicate(AliasTy { substs: [_, (&str, lip::Location, _)], def_id: DefId(2:2933 ~ core[1e4f]::ops::function::FnOnce::Output) }, Term::Ty(lip::ParseResult<'_, _, _>)), bound_vars: [] }
┘
┐rustc_trait_selection::traits::project::normalize_with_depth_to depth=9, value=Binder { value: TraitPredicate(<_ as std::marker::Sized>, polarity:Positive), bound_vars: [] }
┘
┐rustc_trait_selection::traits::project::normalize_with_depth_to depth=9, value=Binder { value: TraitPredicate(<_ as std::marker::Sized>, polarity:Positive), bound_vars: [] }
┘
┐rustc_trait_selection::traits::project::normalize_with_depth_to depth=9, value=Binder { value: TraitPredicate(<_ as std::clone::Clone>, polarity:Positive), bound_vars: [] }
┘
┐rustc_trait_selection::traits::project::normalize_with_depth_to depth=9, value=Binder { value: TraitPredicate(<_ as std::ops::FnMut<(&str, lip::Location, _)>>, polarity:Positive), bound_vars: [] }
┘
┐rustc_trait_selection::traits::project::normalize_with_depth_to depth=9, value=Binder { value: TraitPredicate(<_ as std::marker::Sized>, polarity:Positive), bound_vars: [] }

For those encountering similar issues, try setting RUSTC_LOG to info to see more details:

RUSTC_LOG=info RUSTFLAGS="-Z time-passes" cargo build --release > rustc_log.txt 2>&1

My takeaways:

  1. Sometimes it's okay to box, especially when your traits are getting more complicated
  2. It's probably a bad idea to write your own parser combinator library and use it in your project. Use nom or pest instead and it will save you tons of trouble.
  3. I never imagined that the compiler is part of the problem here. I thought that my rust toolchain was corrupted or that there's some sort of infinite recursion in my code. However, as your project gets more complicated, it might hit some of those less optimized (exponential!) edge cases in rustc.
1 Like

Rust compilation is NP-hard problem thus crazy corner cases are expected.

Thank to everyone's comments, I finally found the bottleneck and its solution:

TLDR:

It seems that the compiler takes a long time to type check an impl Trait used in a loop. So I had to box those instances into Box<dyn Trait> and the release builds finish in 30 seconds. The advantage of this solution is that code changes involved are minimal and you also don't suffer from a tremendous allocation overhead compared to boxing everything.

Full explanation

Setup

Here's a simplified example that illustrates this issue. Let's say I want to define a Parser trait as follows:

pub trait Parser<'a> {
    type Output;
    type State: Clone;

    /// Parse a given input, starting at
    /// a given location and state.
    fn parse(
        &mut self,
        input: &'a str,
        location: Location,
        state: Self::State,
    ) -> ParseResult<'a, Self::Output, Self::State>;

Then, I want closures to implement this Parser trait so I can treat them as parsers. Note that we can't directly implement Parser for FnMut here because the type parameter A and S would be unbounded since associated types Output and State in the Parser trait are unfortunately not visible from the type parameter list. Hence, we create a simple wrapper called FnParser that wraps a closure into a Parser.

#[derive(Copy, Clone)]
pub struct FnParser<F, A, S>(F, PhantomData<A>, PhantomData<S>);

pub fn fn_parser<'a, A, S: Clone, F>(f: F) -> FnParser<F, A, S>
where
    F: FnMut(&'a str, Location, S) -> ParseResult<'a, A, S>,
{
    FnParser(f, PhantomData, PhantomData)
}

impl<'a, A, S: Clone, F> Parser<'a> for FnParser<F, A, S>
where
    F: FnMut(&'a str, Location, S) -> ParseResult<'a, A, S>,
{
    type Output = A;
    type State = S;

    fn parse(&mut self, input: &'a str, location: Location, state: S) -> ParseResult<'a, A, S> {
        (self.0)(input, location, state)
    }
}

Problem

Now, let's write a repeat parser combinator that repeats a given parser n times. Notice how the input parser is used in a while loop and the whole body is wrapped inside an fn_parser closure. This somehow causes a headache for rustc and takes a super long time to compile.

/// Repeat a parser n times
pub fn repeat<'a, A: Clone + 'a, P: 'a, S: Clone + 'a>(
    times: usize,
    mut parser: P,
) -> impl Parser<'a, Output = Vec<A>, State = S>
where
    P: Parser<'a, Output = A, State = S>,
{
    fn_parser(move |mut input, mut location, mut state: S| {
        let mut output = Vec::new();
        let mut committed = false;

        if times == 0 {
            return ParseResult::Ok { ... };
        }

        let mut counter = 0;

        while let ParseResult::Ok { ... } = parser.parse(input, location, state.clone())
        {
            if counter >= times {
                break;
            }
            ...
        }

        ParseResult::Ok { ... }
    })
}

Solution

A solution is to replace the fn_parser with a Box::new so that the compiler can treat the body as a pointer to a closure on the heap instead of a struct containing a closure on the stack. This somehow eases the compilation process.

We first define a BoxedParser type alias to avoid typing the full signature all the time. Note how similar the type of BoxedParser is to FnParser. The only change is that FnMut is now wrapped in a Box:

pub type BoxedParser<'a, A, S> = Box<dyn FnMut(&'a str, Location, S) -> ParseResult<'a, A, S> + 'a>;

impl<'a, A, S: Clone + 'a> Parser<'a> for BoxedParser<'a, A, S> {
    type Output = A;
    type State = S;

    fn parse(&mut self, input: &'a str, location: Location, state: S) -> ParseResult<'a, A, S> {
        (*self)(input, location, state)
    }
}

Now, we can update the repeat combinator to return a BoxedParser instead of an FnParser:

/// Repeat a parser n times
pub fn repeat<'a, A: Clone + 'a, P: 'a, S: Clone + 'a>(
    times: usize,
    mut parser: P,
) -> BoxedParser<'a, Vec<A>, S>  // Replaced impl Parser with BoxedParser
where
    P: Parser<'a, Output = A, State = S>,
{
    // Replaced fn_parser with Box::new
    Box::new(move |mut input, mut location, mut state: S| {
        let mut output = Vec::new();
        let mut committed = false;
        ...

Do this for all the functions that contain calls to impl Parser in a loop and you'll be golden.

P.S. It'll be great if someone can explain more about the intuitions behind the slowdown.

10 Likes

It is almost certainly not the Box that has any effect, but the dyn. Instead of building up a very complex type with static calls through everything, you request dynamic dispatch, so the compiler analyzes, and generates code for, each parser-closure separately, not combined into the whole parser. So you avoid any exponential blowup by keeping the inputs small.

The disadvantage of this is that you will get much less inlining.

(Disclaimer: I haven't read your code except for what's in your message.)

7 Likes

It is almost certainly not the Box that has any effect, but the dyn . Instead of building up a very complex type with static calls through everything, you request dynamic dispatch, so the compiler analyzes, and generates code for, each parser-closure separately , not combined into the whole parser. So you avoid any exponential blowup by keeping the inputs small.

That explanation makes sense. Thanks!

I haven't looked into the code, but boxing may be unnecessary. You could pass &mut dyn Parser into your methods and keep the parser state on stack.

1 Like

That's an interesting suggestions. However, I think my issue might be a bit different. I need to return a parser without boxing it, rather than taking in an unboxed parser. I tried to remove the Box around the dyn FnMut but rustc gives me the following warning to box unsized trait object:

error[E0746]: return type cannot have an unboxed trait object
    --> src/lib.rs:1350:6
     |
1350 | ) -> BoxedParser<'a, Vec<A>, S>
     |      ^^^^^^^^^^^^^^^^^^^^^^^^^^ doesn't have a size known at compile-time
     |
help: box the return type, and wrap all of the returned values in `Box::new`
     |
1350 ~ ) -> Box<BoxedParser<'a, Vec<A>, S>>
1351 | where
1352 |     P: Parser<'a, Output = A, State = S>,
1353 | {
1354 ~     Box::new((move |mut input, mut location, mut state| {
1355 |         let mut output = Vec::new();
   ...
1433 |         }
1434 ~     }))
     |

Following a suggestion from a compiler message, I tried to further define a Sized version of FnMut as a new trait as follows:

trait SizedFnMut<'a, A, S>: FnMut(&'a str, Location, S) -> ParseResult<'a, A, S> + Sized {}

pub type FnMutParser<'a, A, S> = dyn SizedFnMut<'a, A, S>;

impl<'a, A, S: Clone + 'a> Parser<'a> for FnMutParser<'a, A, S> {
    type Output = A;
    type State = S;

    fn parse(&mut self, input: &'a str, location: Location, state: S) -> ParseResult<'a, A, S> {
        (*self)(input, location, state)
    }
}

The above also does not work because I can't implement the Parser trait for a Sized object.

error[E0038]: the trait `SizedFnMut` cannot be made into an object
    --> src/lib.rs:1240:43
     |
1240 | impl<'a, A, S: Clone + 'a> Parser<'a> for FnMutParser<'a, A, S> {
     |                                           ^^^^^^^^^^^^^^^^^^^^^ `SizedFnMut` cannot be made into an object
     |
note: for a trait to be "object safe" it needs to allow building a vtable to allow the call to be resolvable dynamically; for more information visit <https://doc.rust-lang.org/reference/items/traits.html#object-safety>
    --> src/lib.rs:1236:84
     |
1236 | trait SizedFnMut<'a, A, S>: FnMut(&'a str, Location, S) -> ParseResult<'a, A, S> + Sized {}
     |       ---------- this trait cannot be made into an object...                       ^^^^^ ...because it requires `Self: Sized`

Anyways, there might be a way to get this to work but I guess a few heap allocations wouldn't be too bad.

By the way, I encountered a similar exponential time issue when writing Swift code today :joy:. However, Swift provided a nice error message saying that:

The compiler is unable to type-check this expression in reasonable time; try breaking up the expression into distinct sub-expressions

Here's a screenshot of Xcode. The Swift code that causes this error message is not relevant:

I think rustc can also try to give warnings like this when the type-checking takes too long. It would be better to inform the user about which part of the compilation is taking too long instead of just hanging.

Unfortunately, assigning such blame is not particularly simple (halting problem and all that). rustc is structured as a batch compiler which provides all of the compile outputs at once, so any kind of "this is taking a while" note has to bypass that via some bespoke system.

That said, there's now a live warning when const evaluation takes to many evaluation steps (at every power of two over 100,000 per const item, IIRC, though the exact number doesn't matter), so perhaps it's theoretically possible for other phases of the compiler (namely, typeck) to do the same.

Though assigning back to what part of the source resulted in the obligations is still far from simple; the best you're likely to be able to get is pointing at an entire function item, and it might not be strictly "because of" that function item but rather "because of" a function it's calling instead.

If this case isn't a case of exponential blowup due to symbol length (which has happened before), my guess (without any evidence) would be that deep impl Trait composition causes exponential blowup because some amount of obligation caching is failing, resulting in rechecking the entire compositions' obligations for each level (obviously O(n²)). (It could be as simple as exceeding the size of an LRU cache or similar, or as complicated as a change in top-level parameter changing the type identity deep into the composition, or even more subtle.)

1 Like

Unfortunately, assigning such blame is not particularly simple (halting problem and all that). rustc is structured as a batch compiler which provides all of the compile outputs at once, so any kind of "this is taking a while" note has to bypass that via some bespoke system.

I agree. It's probably not easy to trace back to exactly the point that causes the timeout. However, I do think there can be some heuristics in place to give some warnings as to why the compilation is taking so long. Maybe as simple as saying that "compilation has exceeded 5 minutes, you may want to inspect the log of rustc to see if an exponential compilation time is possible", similar to how running tests give you a warning after exceeding 1 minute.

You may want to look into enum dispatch to replace certain dynamic dispatch in critical path sections of your code. As a rule of thumb, you never really want to combine frequently called code with dyn impl.

https://crates.io/crates/enum_dispatch

1 Like

enum_dispatch looks like a wonderful package! I'll give it a try, thanks.