Question about zipwriter

I need to create a zipfile in memory (this will be compiled to WASM and run in the browser).

I have followed the man page and built a small function uses an in-memory buffer of a certain fixed size.

However, if I put files into it that are longer than the buffer can hold, I get an error. (of course!)

...but I've found that if I scale the buffer to be large enough to hold the three files I need, the compression goes just fine, returning success at each stage ...but the decompression fails with the error

InvalidArchive("Could not find central directory end")

Specifically, a buffer of u8 of len 128 * 1024 is the max that works.

I have included a complete test.

cargo --version
cargo 1.78.0-nightly (7b7af3077 2024-02-17)

Cargo.toml

[package]
name = "hello_cargo"
version = "0.1.0"
edition = "2021"

[dependencies]
rand = "0.8.5"
zip = "0.6.6"
// use zip::write;
use rand::{distributions::Alphanumeric, Rng}; 
use std::io::{Read, Write};
use std::process::exit;

fn main() {

	// 1: setup
	let repeat = 64 * 1024;
	let a_data: String = rand::thread_rng().sample_iter(&Alphanumeric).take(repeat).map(char::from).collect();
	let b_data: String = rand::thread_rng().sample_iter(&Alphanumeric).take(repeat).map(char::from).collect();
	let c_data: String = rand::thread_rng().sample_iter(&Alphanumeric).take(repeat).map(char::from).collect();

	// 2: transform
	//
	const BUFF_SIZE:usize = 256 * 1024;  
	let mut buf: [u8; BUFF_SIZE] = [0; BUFF_SIZE];
	{
		let cur = std::io::Cursor::new(&mut buf[..]);
		let options = zip::write::FileOptions::default();
		let mut zip = zip::ZipWriter::new(cur);

		// file A
		let ret = zip.start_file("a.txt", options);
		if ret.is_err() { println!("err A-1 = {:?}", ret.err()); exit(-1); }

		let ret = zip.write(a_data.as_bytes());
		if ret.is_err() { println!("err A-2 = {:?}", ret.err()); exit(-1); }	

		// file B
		let ret = zip.start_file("b.txt", options);
		if ret.is_err() { println!("err B-1 = {:?}", ret.err()); exit(-1); }

		let ret = zip.write(b_data.as_bytes());
		if ret.is_err() { println!("err B-2 = {:?}", ret.err()); exit(-1); }	

		// file C
		let ret = zip.start_file("c.txt", options);
		if ret.is_err() { println!("err C-1 = {:?}", ret.err()); exit(-1); }

		let ret = zip.write(c_data.as_bytes());
		if ret.is_err() { println!("err C-2 = {:?}", ret.err()); exit(-1); }	

		// finish
		let ret = zip.finish();
		if ret.is_err() { println!("err finish = {:?}", ret.err()); exit(-1);} 
	}
	let binary_data = buf.to_vec();
	println!("binary_data.len() = {}", binary_data.len());

	// 3: test
	//
	let reader = std::io::Cursor::new(binary_data);
	let mut zip = zip::ZipArchive::new(reader).unwrap();

	struct Gold {
		filename: String,
		contents: String,
	}
	let gold = [
		Gold {
			filename: "a.txt".to_string(),
			contents: a_data.clone(),
		},
		Gold {
			filename: "b.txt".to_string(),
			contents: b_data.clone(),
		},
		Gold {
			filename: "c.txt".to_string(),
			contents: c_data.clone(),
		},
	];


	
	for (i, g) in gold.iter().enumerate() {
		let mut file = zip.by_index(i).unwrap();
		const BUFFSIZE: usize = 1024 * 1024;
		let mut buff: [u8; BUFFSIZE] = [0; BUFFSIZE];
		let buff_size = file.read(&mut buff).unwrap();
		let buff_str: String = String::from_utf8(buff.to_vec()[0..buff_size].to_vec()).unwrap();

		println!("========== Filename: {:?}", file.name());
		// println!("size as read ={:?} ; buff = '{:?}'", buff_size, buff_str);

		assert_eq!(file.name(), g.filename);
		assert_eq!(buff_str, g.contents);
	}

	
}

command line

cargo run

result

    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.01s
     Running `target/debug/hello_cargo`
binary_data.len() = 262144
thread 'main' panicked at src/main.rs:54:48:
called `Result::unwrap()` on an `Err` value: InvalidArchive("Could not find central directory end")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Things I have already tried, without success:

  • set compression method to Stored,
  • set compression method to Deflated
  • set compression method to Bzip2
  • implement buffer as array of u8
  • implement buffer as array of u32 (cursor not defined)
  • implement buffer as vec, initialized w 0s
  • implement buffer as vec, sized but not initialized
  • replace start_file() with start_file_aligned(16)
  • use FileOptions::default() without modification

Looks like you're trying to decompress the entire buffer, including the unused parts at the end. You should inspect the cursor after finish and find out how many bytes were written.

2 Likes

You're turning your entire buf array into binary_data, including any trailing NULs that weren't overwritten. (Then the zip reader can't find the CDE close enough to the EOF.)

You need to figure out how many bytes were actually written, and then turn that portion into a Vec. You might be able to do this with Cursor (though it will depend on how the implementation works, e.g. assuming the cursor position was at EOF after the call to finish or the like). Or you could use a dynamic buffer that inherently tracks the length.

Or if archives are guaranteed not to end in NUL, that's another option (I have no idea if this is the case or not).

1 Like

It looks like you may be able to find the EOF in a supported manner using this function.

The Drop implementation of ZipFile ensures that the reader will be correctly positioned after the structure is done.

(Maybe. The comment is too vague to be sure.)

1 Like

Some other problems:

You need to use write_all. write is intended to be called repeatedly until all data is written, and rarely used manually.

Same thing for read: you should use read_to_end instead. You're turning it into a Vec anyway, and the 1024 * 1024 long array causes a stack overflow on my machine. And all this can be condensed into std::io::read_to_string.

2 Likes

Here's my rewrite.

use rand::distributions::{Alphanumeric, DistString};
use std::io::Write;

fn main() {
    // 1: setup
    let repeat = 64 * 1024;
    let mut rng = rand::thread_rng();
    let a_data: String = Alphanumeric.sample_string(&mut rng, repeat);
    let b_data: String = Alphanumeric.sample_string(&mut rng, repeat);
    let c_data: String = Alphanumeric.sample_string(&mut rng, repeat);

    // 2: transform
    //
    const BUFF_SIZE: usize = 256 * 1024;
    let buf = Vec::with_capacity(BUFF_SIZE);
    let buf = {
        let cur = std::io::Cursor::new(buf);
        let options = zip::write::FileOptions::default();
        let mut zip = zip::ZipWriter::new(cur);

        // file A
        zip.start_file("a.txt", options).unwrap();
        zip.write_all(a_data.as_bytes()).unwrap();

        // file B
        zip.start_file("b.txt", options).unwrap();
        zip.write_all(b_data.as_bytes()).unwrap();

        // file C
        zip.start_file("c.txt", options).unwrap();
        zip.write_all(c_data.as_bytes()).unwrap();

        // finish
        let cur = zip.finish().unwrap();
        let len = cur.position();
        let mut buf = cur.into_inner();
        buf.truncate(len as usize); // unsure if this is necessary
        buf
    };

    // 3: test
    //
    let reader = std::io::Cursor::new(buf);
    let mut zip = zip::ZipArchive::new(reader).unwrap();

    struct Gold {
        filename: String,
        contents: String,
    }
    let gold = [
        Gold {
            filename: "a.txt".to_string(),
            contents: a_data,
        },
        Gold {
            filename: "b.txt".to_string(),
            contents: b_data,
        },
        Gold {
            filename: "c.txt".to_string(),
            contents: c_data,
        },
    ];

    for (i, g) in gold.into_iter().enumerate() {
        let mut file = zip.by_index(i).unwrap();
        let buff = std::io::read_to_string(&mut file).unwrap();

        println!("========== Filename: {:?}", file.name());
        // println!("size as read ={:?} ; buff = '{:?}'", buff_size, buff_str);

        assert_eq!(file.name(), g.filename);
        assert_eq!(buff, g.contents);
    }
}
3 Likes

This worked great; thanks!