Which library is most commonly used to read and write to archive files?

Regardless of what archive it is (such as zip, tar or even docx) is there a library that can open such files, allows me to modify any of these files and then writes back to the archive?

i thank library commonly use for read zip and other zipfile is

for reading and writing

Is it possible to read any text files inside the archive, modify the text files, write it back to the archive?

There are several libraries for accessing zip, tar, etc.

Docx is "just" a zip with a funny extension.

Most archive formats don't really support random access writes though, or only do so with caveats that files later in the archive may have to be rewritten (if the changed file changed size).

In particular random access is not really a thing with tar, especially once the tar is wrapped in compression (tar.gz, tar.xz, etc). In fact for a compressed tar archive you can't even do seeks, but have to uncompress the entire stream up to the point of interest, and if you want to make changes you will have to recompress everything afterwards in the file.

1 Like

I actually just had an issue with one of the libraries which woudn;t even support docx that is why I asked in the forum.

Which makes sense considering that tar stands for Tape ARchive, and magnetic tape isn’t exactly known for its random access capabilities.

1 Like

I assume I have to use these two to read from the archive, I can then write to a text file and then using the write function to write back to the zip archive?

I believe the way to do is to extract the whole archive in a temporary directory, do the modification, then create a new zip file (as Vorpal said), but beware of the metadata (timestamps, etc). When looking at the documentation linked above, all I could find was a way to append to an existing zip file (it is actually one of the examples provided in the crate).

If you look at those discussions, there are clearly a few in-place suggestions, but even modifying the metadata of a file isn't supported yet.

In other languages, there's sometimes support for in-place modification (ZipFile in Python), but they do just that: extract everything and repack, though sometimes they do it in memory rather than on disk—which you can probably do, too.

Another way to deal with that, if you have huge archives and small files, and if you can allow some overhead, is to append a new file that replaces the older version. You'll have to look if that ZipWriter allows you to do that.

A third and ugly way is to launch an external tool that does it for you.

2 Likes

What is an archive?
It is a box, where you put multiple items in and handle all items in it as single item. And it is not just any box, it is a box which fits all items in it perfectly. Here are no sticking parts of items from it and here is no empty space in it.

Now think, how could you replace some items in this box? If you just want to replace same type of socks in it but with diffirent color, maybe, just maybe, you could do it. It it may need a lot of work and repacking. But if you want to replace socks with jacket, you first will need to make your jacket as small as socks in box. Can you do it?

If not you can't replace socks with jacket, you will need new box and move all items to it. leaving socks out and puting jacket in it.

You could put same lable on box and treat it same as previous box, but it will be new box.

Same is with archives. It is box of some size. You can't change its size. But you can repack all item to new archyve and save it with same name. Replacing archive.

Here may be libraries who do all this for you and it looks like they just replace items in archive, but in truth, they just create new archive and save it with same name.

P.S. most archive are "factory packaging", it's like you buy package from shop unpack it and it is imposible to put items back in same package as it was packed.

1 Like

Nothing prevents an in-place modification, though. According to the Zip file format specification that library is following, each file is compressed (and optionally encrypted) individually, so there's no common dictionary, nor are several files streamed and compressed together as in other methods. This allows for quick retrieval of individual files.

This is confirmed in the source code, in src/compression.rs:

/// Each file's compression method is stored alongside it, allowing the
/// contents to be read without context.

It's just that the feature hasn't been implemented yet, maybe in part because it opens a series of questions about how to handle the feature. If the Zip file is relatively small, truncating the Zip, replacing the file, then putting back the tail shouldn't be too problematic, but what if it's a very big Zip? It begs for options like "yes, really do that" or "just append a new version of my file".

1 Like

Anyways its all working for me now, thanks guys.

1 Like

Could you please describe your solution, for the benefit of others that have problems and read this thread in the future? You can also mark one of the replies as the solution by checking the box at the bottom of the reply.

1 Like

Yep sure this is my code solution:

    let archive = File::open("Sample.docx").unwrap();
    let mut archive = ZipArchive::new(archive).unwrap();

    // Buffer for the new zip archive
    let mut buffer = Cursor::new(Vec::new());
    let mut writer = ZipWriter::new(&mut buffer);

    for i in 0..archive.len()
    {
        let mut file = archive.by_index(i).unwrap();
        let filename = file.enclosed_name().unwrap();
        let filename = filename.to_str().unwrap();

        if filename == "word/document.xml"
        {
            let mut contents = String::new();
            file.read_to_string(&mut contents).unwrap();

            contents = contents.replace("[Old]", "New");

            writer.start_file(filename, FileOptions::<()>::default()).unwrap();
            writer.write_all(contents.as_bytes()).unwrap();

            //println!("{}", edit_file);
        }

        // Copy other files as is
        else
        {
            //let mut file = archive.by_index(i).unwrap();
            writer.start_file(filename, FileOptions::<()>::default()).unwrap();
            std::io::copy(&mut file, &mut writer).unwrap();
        }
    }

    writer.finish().unwrap();
    std::fs::write("Output.docx", buffer.into_inner()).unwrap();
1 Like