Flate2 + BufReader Issue (Possible Infinite Loop Bug)


#1

If I do something like:

    let file = OpenOptions::new()
        .read(true)
        .open(&path).unwrap();
    let json_bytes = GzDecoder::new(file);
    let reader = io::BufReader::new(json_bytes);

    let stream = reader.lines().enumerate().map(move |(i,line)| {

This will cause an infinite loop. Now I know that I need to unwrap json_bytes however I would expect the compiler to not accept the Result<> set from GzDecoder in BufReder::new instead of allowing it and the program infinite looping.


#2

GzDecoder::new() returns GzDecoder, not a Result, from what I can tell.

What’s the infinite loop though?


#3

Sorry not a result, but if it does error and you pass that along to BufReader, when you iter through it it will loop forever. I ended up having to do read byte check before moving on after GzDecoder::new.


#4

You mean if it errors while decompressing? That should be propagated to you in the line binding you have in the map closure - line is a Result<String, io::Error>. Are you checking that?


#5

so the rest if my code:

    let stream = reader.lines().enumerate().map(move |(i,line)| {
        line.map_err(|e| LineError(ParserError::IoError(e), i, path.to_str().unwrap().to_owned()))
            .and_then(|record: String| {
                parser.process(&record).map_err(|e| LineError(e, i, path.to_str().unwrap().to_owned()))
            })
    });

#6

That converts the io::Error to LineError and then only calls parser.process(...) if there was no error (because of and_then()). But if every line errors, you’re effectively going to iterate over everything without ever detecting it.

What do you want to happen here? By the sound of it, you’d like to break out of the iteration and return/propagate the error to the caller? If so, maybe iterate over the lines manually:

for (i, line) in reader.lines().enumerate() {
    let line = line.map_err(|e| LineError(ParserError::IoError(e), i, path.to_str().unwrap().to_owned()))?; // bail if it's an error
    ...
}

Edit: should also mention that the snippet you showed is just defining the iterator chain, but not actually iterating over it. So it’s unclear how you’re handling errors when you’re actually traversing the iterator.


#7

But what if I just want to output the error and continue to the next line? That is what this is supposed to do unless I pass it an invalid gzip then it spins out of control. I do not care about returning the error to the caller.


#8

it shouldn’t loop forever…


#9

I’m still unclear on what exactly is causing an infinite loop. What is this byte check you referred to? Is the file corrupt and somehow making it spin? Does it even yield lines to you (with error) or is it “hijacked” and not returning? And if it’s the latter, I’d consider that a bug in flate2.


#10

I can recreate this by proving an empty file named empty.gz and then my app will read the file, ungzip it, and try to iterate through each line. causing it to spin forever. The line error gets outputted, i gets incremented indefinitely.


#11

If I zip up an invalid file but as a valid GZ, everything works fine. I am working on providing a better example, one moment.


#12

This will recreate the issue:


extern crate flate2;

use std::io::{BufRead, BufReader};
use std::path::PathBuf;
use std::fs::OpenOptions;
use flate2::read::GzDecoder;

fn main() {
	let path = PathBuf::from("./empty.gz");

    let file = OpenOptions::new()
        .read(true)
        .open(&path).unwrap();
    let cbjson = GzDecoder::new(file);

    let reader = BufReader::new(cbjson);

    let stream = reader.lines().enumerate().map(move |(i,line)| {
        println!("LINE {} {:?}", i, line);
    });

    for result in stream { }
}


#13
LINE 123213 Err(Error { repr: Kind(Other) })
LINE 123214 Err(Error { repr: Kind(Other) })
LINE 123215 Err(Error { repr: Kind(Other) })
LINE 123216 Err(Error { repr: Kind(Other) })
LINE 123217 Err(Error { repr: Kind(Other) })
LINE 123218 Err(Error { repr: Kind(Other) })
LINE 123219 Err(Error { repr: Kind(Other) })
LINE 123220 Err(Error { repr: Kind(Other) })
LINE 123221 Err(Error { repr: Kind(Other) })
LINE 123222 Err(Error { repr: Kind(Other) })
LINE 123223 Err(Error { repr: Kind(Other) })
LINE 123224 Err(Error { repr: Kind(Other) })
LINE 123225 Err(Error { repr: Kind(Other) })
LINE 123226 Err(Error { repr: Kind(Other) })
LINE 123227 Err(Error { repr: Kind(Other) })
LINE 123228 Err(Error { repr: Kind(Other) })
LINE 123229 Err(Error { repr: Kind(Other) })
LINE 123230 Err(Error { repr: Kind(Other) })

#14

Yeah, I see that. The “culprit” seems to be this block: https://github.com/alexcrichton/flate2-rs/blob/master/src/gz/bufread.rs#L357-L360. The header is used in construction here, and it’s an Err. I’m not sure why the code was written like that. Maybe ask on their github?

In the interim, perhaps you can add the following (or similar) right after getting the cbjson:

cbjson.header().expect("Invalid gz header");

#15

Hmm yea I see that. Let me try to add an expect to the header. Thanks for your speedy reply.


#16

Looks like https://github.com/alexcrichton/flate2-rs/issues/119 made GzDecoder::new return GzDecoder rather than io::Result<GzDecoder> like it did prior to that change. The issue mentions not doing I/O in the constructor, but that’s not actually the case - it merely “hides” the I/O by not returning a Result. It also masks errors such that it’s not possible to discern a malformed header vs corrupt data in the stream.

To me, it seems like a regression in API design and robustness but I’m not overly familiar with the rest of flate2.

That said, @justmike2000, you should bail out of iteration if you get an error of any kind. If the backing file is corrupt in other ways, you shouldn’t expect a line reader or the GzDecoder to behave correctly after the first error.