Thank to everyone's comments, I finally found the bottleneck and its solution:
TLDR:
It seems that the compiler takes a long time to type check an impl Trait
used in a loop. So I had to box those instances into Box<dyn Trait>
and the release builds finish in 30 seconds. The advantage of this solution is that code changes involved are minimal and you also don't suffer from a tremendous allocation overhead compared to boxing everything.
Full explanation
Setup
Here's a simplified example that illustrates this issue. Let's say I want to define a Parser
trait as follows:
pub trait Parser<'a> {
type Output;
type State: Clone;
/// Parse a given input, starting at
/// a given location and state.
fn parse(
&mut self,
input: &'a str,
location: Location,
state: Self::State,
) -> ParseResult<'a, Self::Output, Self::State>;
Then, I want closures to implement this Parser
trait so I can treat them as parsers. Note that we can't directly implement Parser
for FnMut
here because the type parameter A
and S
would be unbounded since associated types Output
and State
in the Parser
trait are unfortunately not visible from the type parameter list. Hence, we create a simple wrapper called FnParser
that wraps a closure into a Parser
.
#[derive(Copy, Clone)]
pub struct FnParser<F, A, S>(F, PhantomData<A>, PhantomData<S>);
pub fn fn_parser<'a, A, S: Clone, F>(f: F) -> FnParser<F, A, S>
where
F: FnMut(&'a str, Location, S) -> ParseResult<'a, A, S>,
{
FnParser(f, PhantomData, PhantomData)
}
impl<'a, A, S: Clone, F> Parser<'a> for FnParser<F, A, S>
where
F: FnMut(&'a str, Location, S) -> ParseResult<'a, A, S>,
{
type Output = A;
type State = S;
fn parse(&mut self, input: &'a str, location: Location, state: S) -> ParseResult<'a, A, S> {
(self.0)(input, location, state)
}
}
Problem
Now, let's write a repeat
parser combinator that repeats a given parser n times. Notice how the input parser
is used in a while
loop and the whole body is wrapped inside an fn_parser
closure. This somehow causes a headache for rustc and takes a super long time to compile.
/// Repeat a parser n times
pub fn repeat<'a, A: Clone + 'a, P: 'a, S: Clone + 'a>(
times: usize,
mut parser: P,
) -> impl Parser<'a, Output = Vec<A>, State = S>
where
P: Parser<'a, Output = A, State = S>,
{
fn_parser(move |mut input, mut location, mut state: S| {
let mut output = Vec::new();
let mut committed = false;
if times == 0 {
return ParseResult::Ok { ... };
}
let mut counter = 0;
while let ParseResult::Ok { ... } = parser.parse(input, location, state.clone())
{
if counter >= times {
break;
}
...
}
ParseResult::Ok { ... }
})
}
Solution
A solution is to replace the fn_parser
with a Box::new
so that the compiler can treat the body as a pointer to a closure on the heap instead of a struct containing a closure on the stack. This somehow eases the compilation process.
We first define a BoxedParser
type alias to avoid typing the full signature all the time. Note how similar the type of BoxedParser
is to FnParser
. The only change is that FnMut
is now wrapped in a Box
:
pub type BoxedParser<'a, A, S> = Box<dyn FnMut(&'a str, Location, S) -> ParseResult<'a, A, S> + 'a>;
impl<'a, A, S: Clone + 'a> Parser<'a> for BoxedParser<'a, A, S> {
type Output = A;
type State = S;
fn parse(&mut self, input: &'a str, location: Location, state: S) -> ParseResult<'a, A, S> {
(*self)(input, location, state)
}
}
Now, we can update the repeat
combinator to return a BoxedParser
instead of an FnParser
:
/// Repeat a parser n times
pub fn repeat<'a, A: Clone + 'a, P: 'a, S: Clone + 'a>(
times: usize,
mut parser: P,
) -> BoxedParser<'a, Vec<A>, S> // Replaced impl Parser with BoxedParser
where
P: Parser<'a, Output = A, State = S>,
{
// Replaced fn_parser with Box::new
Box::new(move |mut input, mut location, mut state: S| {
let mut output = Vec::new();
let mut committed = false;
...
Do this for all the functions that contain calls to impl Parser
in a loop and you'll be golden.
P.S. It'll be great if someone can explain more about the intuitions behind the slowdown.