How to work around recursive type expansion issues?


#1

I’m working on a parser that parses a packet-based stream. Some of the packets are containers. A container also holds packets. In fact, a container can hold any valid packet. For instance, a compressed data packet could hold a compressed data packet! (This probably isn’t a good idea, but for other containers, this is actually sensible.)

To parse the packets, I’m passing around a reference to an object implementing the std::io::Read trait. Sometimes, I add a filter, like in the compression case. Since the contents of a container are a valid message, after creating the filter, I just call the main deserialize function again.

So, my API looks like:

fn deserialize<T: Read>(bio: &mut T) -> Result<T>;
fn compressed_data_body<T: Read>(bio: &mut T) -> Result<T>;
...

The practical result of this arrangement is that rustc refuses to compile my code:

reached the recursion limit while instantiating `deserialize::<flate2::read::DeflateDecoder<&mut flate2::read::DeflateDecoder<&mut flate2::read::DeflateDecoder<&mut flate2::read::DeflateDecoder<&mut flate2::read::DeflateDecoder<...

This isn’t surprising: deserialize calls compressed_data_body and vice versa, and compressed_data_body can extend it’s generic parameter.

I haven’t figure out how to communicate a base case to the compiler (although I do have one at run-time: the maximum recursion depth is 16-levels), and, unfortunately, I suspect that there is no way to do so.

One way to break this logjam is to Box all of the readers. But, this is ugly and I can’t do any introspection, which is occasionally helpful.

Since the interface is public, I’d like to preserve an idiomatic interface as much as possible.

A small example illustrating my problem is here: https://play.rust-lang.org/?gist=27b4e3f2630ae5d76c9d70f2b97f864a&version=stable

Thanks!


#2

So how do you parse this exactly? The playground example isn’t very sensical since it just keeps layering the deflater (I understand you’re trying to show the error). But say after one deflate call, how do you know what you’re dealing with, a container or a leaf packet?


#3

I’m sorry if the example was too minimal :grinning:. You can imagine that the deserialize function reads in a header and then calls one of the packet processing functions, as appropriate. If the packet is a container, then an appropriate filter is added, and deserialize is called again (as shown in the example code). If the packet is not a container, then the contents of the packet are deserialized into a struct, which is returned.


#4

Ok, gotcha.

So what’s happening here is probably not what you want. You want (I presume) to use a single type (DeflateDecoder) to read through the underlying R:Read, parsing out leaf packets and “stepping into” containers. What you end up with in the current code is a type level recursive type that … doesn’t make sense :slight_smile:, even if it compiled. In other words, you don’t really want a DeflateDecoder<DeflateDecoder<DeflateDecoder<...>>>.

So, maybe something like this instead:

fn compressed_data_body<R: Read>(deflater: flate2::read::DeflateDecoder<R>)
        -> Result<flate2::read::DeflateDecoder<R>, std::io::Error> {    
    let _message = deserialize(deflater);

    unreachable!();
}

fn deserialize<R: Read>(bio: flate2::read::DeflateDecoder<R>) -> Result<flate2::read::DeflateDecoder<R>, std::io::Error> {
    return Ok(compressed_data_body(bio)?);
}

fn main() {
    use std::fs::File;

    let mut f = File::open("foo").unwrap();
    let bio = f.deflate_decode();
    let _ = deserialize(bio);
}

From an API standpoint, you allow the caller to give you R:Read, which you then form the DeflateDecoder<R> off but use that concrete type internally rather than recursively deflate_decode off it (and thus creating that type layer).


#5

Thanks, but I don’t think this is what I want. I’m sorry that I wasn’t clearer.

The file is not compressed, the contents of compressed packets are compressed. Other containers perform other transformations of the input stream, e.g., a decryption packet decrypts its contents (which is an array of packets). These transformations are also realized by pushing a std::io::Read object on the stack Thus, if I have two nested compressed packets, then I really do want two decompressors to run, and then the inner call to deserialize should take aDeflateDecode<DeflateDecode<File>> object. Whereas if an encrypted packet contains a compressed packet, the inner call to deserialize should be DeflateDecode<Decrypt<File>>.


#6

Please check crate seq and the README
The Seq container demonstrates recursive types with boxing and also with pure references. It is important to get the lifetimes correctly, always tell the compiler that the inner objects live as long as the outer objects, or even longer.


#7

Ok, I see (for real now I hope - sorry for being dense). I guess an analog might be how you’d chain iterator or future combinatory, creating a type-based state machine. And that is what you want.

I’m not sure this can be represented recursively at the type level (ie you need boxing to erase the type and stop compiler from going off into the weeds). You have a bound of depth 16 but that’s not static - just a maximum that you could enforce at runtime. But otherwise, the actual nesting is purely runtime - depends on the packet structure.

But, I’d be interested if someone could come up with something purely at the type level. I suppose one approach might be to always generate a 16-level deep type, but injecting “noop” types once a real data packet is found. I doubt that would be pleasant to maintain though :slight_smile:. Maybe there’s a better solution though …


#8

I was afraid that I’d have to resort to type erasure. I hope someone shares an elegant solution! Thanks for taking some time to share your thoughts!


#9

If you are fine with Box-ing, allocating dynamic-memory. The DeflateDecode datatype is simply a list of elments. Each element can be a binary object (Blob) or another deflated collection.

struct DeflateDecode<T> {
    elems: Vec<Decoded<T>>
}

enum Decoded<T> {
    Deflated(Box<DeflateDecode<T>>),
    Blob(T),
}