Understanding Type Explosion Solved by Boxing?

I finished following along in this tutorial about writing your own parser combinators, and partway through, there was a bug in the implementation that the author anticipated, and demonstrated a fix for:

error: reached the type-length limit while instantiating `pair::<for<'r> fn(&'r str) -> st...::St
ring, std::string::String)>>`

I am having trouble understanding more about how this happens. The solution is to implement a way to shove portions of the parsers you build up into Boxes, but I am still unclear how the types explode in the first place.

I sort of understand the example regarding how a Box is needed for implementing the type for a linked list, but I don’t see how this parser combinator implementation causes the types to balloon out of control.

I have attempted to get the compiler to print out the types of things by giving it faulty annotations, like this:

let z: i32 = zero_or_more(right(space1(), attribute_pair()))

But the error message does not appear to expose a particularly daunting type:

error[E0308]: mismatched types
   --> src/lib.rs:363:18
    |
363 |     let z: i32 = zero_or_more(right(space1(), attribute_pair()));
    |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected i32, found opaque type
    |
    = note: expected type `i32`
               found type `impl Parser<'_, std::vec::Vec<(std::string::String, std::string::String)>>`

Link to Rust Playground

What can I read to learn more about this problem, and how do I get feedback in the code I’ve written to detect these kinds of type explosions earlier? How do I inspect what I’ve written to see the kind of type_length burden my code is incurring?

This happens because you are trying to create a recursive type, but that isn’t allowed by Rist due to texhinal limitations. So yoh muat Box it and erase the type inside of the Box by using trait objects. This makes some indirection and allows Rust to reason about it.

Type aliases or type hiding with impl Trait does not introduce indirection

There’s two different things here:

  • The example of a Cons list, which the author uses as an analogy. This type is infinite in size without a Box. (i.e. without a Box it stores infinite copies of A; with a Box, it stores an A and a pointer)
  • The actual technique used by the author, which is type erasure through dyn Trait. The Box is only incidental here (dynamic polymorphism requires a pointer type like Box because the size of the type cannot be known).

When you return impl Trait, the compiler still remembers the true type of what is returned; it has to, so that it can compute how large the type is, and so that it can determine whether the type implements Send and Sync. But when it prints the type out, it still only writes impl Trait. That’s why you don’t see the issue.

Here, I’ve taken the parsers used in the blog post and eliminated all instances of impl Trait except in the highest level function, so that we can see fully written out types.

You were looking at attributes in your post.

type Attributes<'a> = ZeroOrMore<Right<'a, Space1, AttributePair<'a>>>;

Expanding AttributePair<'a>

type Attributes<'a> = ZeroOrMore<Right<'a, Space1, Pair<Identifier, Right<'a, MatchLiteral, QuotedString<'a>>>>>;

Expanding QuotedString<'a>:

type Attributes<'a> = ZeroOrMore<Right<'a, Space1, Pair<Identifier, Right<'a, MatchLiteral, Map<Right<'a, MatchLiteral, Left<'a, ZeroOrMore<Pred<AnyChar, fn(&char) -> bool>>, MatchLiteral>>, fn(Vec<char>) -> String>>>>>;

this is already looking ugly and there’s still plenty of things left to expand! In particular, with the way I wrote it, Left and Right mention each of their input types 2 or 3 times, resulting in exponential blowup. (this is partly my fault for trying to avoid having to manually write the return types in them; I don’t think this happens in the original code)

type Left<'a, P1, P2> = Map<Pair<P1, P2>, fn((OutTy<'a, P1>, OutTy<'a, P2>)) -> OutTy<'a, P1>>;
type Right<'a, P1, P2> = Map<Pair<P1, P2>, fn((OutTy<'a, P1>, OutTy<'a, P2>)) -> OutTy<'a, P2>>;
4 Likes

This is great!

I can see how the types explode because of the intermediate combinations of these functions now. Is there a way I could inspect these signatures without rewriting them in this way, as you have done?

Well, part of the reason I had to rewrite it is because, when it was using impl Parser everywhere, the types didn’t have names!

I think there is a general rule of thumb we can state here though. Basically, look at this bit:

type QuotedString<'a> =
    Map<
        Right<'a,
            MatchLiteral,
            Left<'a,
                ZeroOrMore<Pred<AnyChar, fn(&char) -> bool>>,
                MatchLiteral,
            >,
        >,
        fn(Vec<char>) -> String,
    >;

fn quoted_string<'a>() -> QuotedString<'a> {
    map(
        right(
            match_literal("\""),
            left(
                zero_or_more(pred(any_char, |c| *c != '"')),
                match_literal("\""),
            ),
        ),
        |chars| chars.into_iter().collect(),
    )
}

As a general rule of thumb, when working with any form of combinators, if every single function returns impl Trait (or its own dedicated type) and there’s never any type erasure, then the return type of any given function will basically look like that function’s body.

(…except that this is still not the fully expanded type. For that, you basically have to picture that all of your functions are recursively inlined; that’s where it gets really ugly!)

1 Like