How to read a gzipped tar archive to `Vec<u8>` without decompressing it?

alisomay · February 9, 2023, 2:21pm

I'm trying to upload a byte vector to cloud storage.

This byte vector should be a compressed archive. To achieve this I need to obtain a Vec<u8> by reading the compressed archive which I have created. I know that gzipped files do not contain their size and when I try to read it normally I don't get all the bytes.

It seems that it only reads the header because the resulting vector is 10 bytes.

Example

use std::io::Read;
    
fn main() {
    
    // Creates the archive and compresses it.
    let file = std::fs::File::create("example.tar.gz").unwrap();
    let encoder = flate2::write::GzEncoder::new(file, flate2::Compression::default());
    let mut archive = tar::Builder::new(encoder);
    archive.append_dir_all("example_dir", "path/to/example_dir").unwrap();
    archive.finish().unwrap();

    // I see that this does not work since it reads a wrong length.
    // But I don't know how to achive it.
    let example_bytes : Vec<u8> = std::fs::read("example.tar.gz").unwrap();
    dbg!(example_bytes.len());
    
    // Corrupt
    std::fs::write("rewritten.tar.gz", example_bytes).unwrap();
}

If I try with BufReader,

    let file = File::open("example.tar.gz").unwrap();
    let mut file = std::io::BufReader::new(file);
    let mut bytes = Vec::new();
    file.rewind().unwrap();
    file.read_to_end(&mut bytes).unwrap();
    // Corrupt
    // The resulting file is not 10 bytes this time but,
    // 392 bytes less than the original amount. 
    // The corrupt file ends with the sequence 
    // FF D3 E5 FF 3B F6 5F A3 F8 if it means something.
    std::fs::write("rewritten.tar.gz", bytes).unwrap();

Is there a way to get the raw bytes of this compressed archive so I can upload it to cloud storage?

alisomay · February 9, 2023, 2:31pm

Resolved in stack overflow.

H2CO3 · February 9, 2023, 2:46pm

This should not be the case. You should be able to read any file fully without further ado. Files do not need to "contain their length". Most files don't – the file system of the OS knows the size of each file.

system · May 10, 2023, 2:46pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
How to read gzip binary file? help	4	2237	December 15, 2022
How to read to a vec as a buf help	11	1202	March 17, 2022
How get &[u8] from Vec<u8> help	12	16997	July 3, 2022
Strange behaviour when pass &mut Vec<u8> to Read::read() help	8	404	April 25, 2023
How to read() from TcpStream and append to Vec<u8> – efficiently help	24	2010	May 11, 2023

How to read a gzipped tar archive to `Vec<u8>` without decompressing it?

Example

Related Topics