What's the best way to get the "prefix" of a zipfile?

I have some code where I want to process a zipfile, and also get access to any data prepended to the file.

The ZipArchive struct in the zip create has an offset() method that lets me get the position where the prefix ends, and it has an into_inner() method to get the underlying reader. So it's all possible, but the problem is that I need the zip object to call offset(), but then into_inner() takes ownership, which means that I can no longer call any other zip methods - so I have to process the prepended data at the end, which is not ideal...

I don't think I can clone the zip object, because the underlying File object isn't cloneable (at least, that's what the error I got when I tried said).

Is there something obvious I'm missing here, or is this really as difficult as I'm finding? There's no read_prepended_data method on the ZipArchive struct, which suggests that the offset method should be sufficient, but I'm not sure how...

Why don't you just rewind and then re-wrap the reader using ZipArchive::new()?

Hmm, do you mean something like

    let mut zip = zip::ZipArchive::new(reader)?;

    let pos = zip.offset();
    let mut r = zip.into_inner();

    // Read the prefix
    r.rewind()?;
    let mut limited = r.take(pos);
    let mut prefix = Vec::new();
    limited.read_to_end(&mut prefix)?;

    r.rewind()?
    let mut zip = zip::ZipArchive::new(r)?;
    // Further processing of the zipfile

I guess that would work, but wouldn't it parse the file twice, looking for the zip directory etc? That seems pretty wasteful.

It wouldn't read the entire file twice, it would only need to re-read the index/header. Doing the actual reading is likely comparable to, or faster than, the syscall needed to merely open the file.

Hmm, OK, I can live with it I guess. It still seems wasteful when the existing ZipArchive has all that information already calculated. I understand that letting 2 bits of code argue over the value of the file pointer is bad, but I can certainly imagine APIs that would be safe (not least a simple read_prepended_data method) and yet I can't see any way of writing them.

I guess what you could also do is apply interior mutability. Wrap the reader into a newtype containing a RefCell, implement Read/BufRead for it by calling borrow_mut() and then forwarding to the underlying reader, and give a shared reference only to the ZipArchive. This will allow you to arbitrarily change the file pointer under the feet of the ZipArchive. You can then read the prefix data once you have the offset, just don't forget to save and restore the file pointer or else you'll mess up the archive and end up with garbage.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.