Write to buffer, read from buffer, clear and repeat?

Hello, I'm trying to create a Tar file in-memory and stream its contents out in a way that doesn't need to allocate the whole tar file in memory, as it might be very large. I'm having some trouble working out the logic. What I have right now is:

    let mut buffer: Vec<u8> = Vec::new();
    let mut archive = tar::Builder::new(&mut buffer);
    let mut count = 0usize;

    for entry in glob("**/*.db").unwrap() {
        if let Ok(path) = entry {
            println!("Adding {}", path.display());

            archive.append_path_with_name(path, "test_path.cool.db").unwrap();

            count += 1;

            if count >= 100 {
                break;
            }
        }
    }

    println!("Buffer size: {}", buffer.len()); // Can't do this! "archive" still has buffer borrowed mutably

Since the Tar file writing needs to have the buffer mutably until its complete, I can't just drop archive. But if I can't drop archive, I'll never be able to clear() it (or read from it for that matter). I feel like I'm either underthinking this or overthinking it, anyone have any suggestions? Thanks for the help!

Well, I’m not sure what you mean by “stream”. To a file? Then you don’t need a buffer, you can just put a File into the builder. Also once you’re done, you can call into_inner() to get back your file/buffer inside the Builder and also finalize the built.

Dropping the Builder also finalizes the tar, and after an explicit drop or by having it in a smaller scope you will also be able to access a reference to a buffer that was mutably referenced inside it. If you have a Builder<File> you don’t really have to worry about it though as the normal implicit drop will finilaze your tar and close the file for you.

If you meant something else with “stream”, like over a network or so, those libraries usually offer some interface that implements Write, too, so you can put it into the Builder as well.

I do mean stream over a network, and it seems like writing a struct that implements Write but does the network streaming might be what I need to do to avoid this dual ownership. I'm using rusoto_s3 which doesn't provide a high level way to stream S3 objects but I think I can write one without too much trouble (knocking on wood).

If you do that, you might want wrap your : Write struct into a BufWriter (can be created with whatever buffer size you like). That way can avoid doing too many small requests for something like multiple small files or in case the tar creates some tiny write calls for padding, headers, etc.

Sure, well I got that working and I think its good enough for now. Now I'm trying to thread the compression and I've got another problem that is probably fairly simple to solve, but I'm basically just trying to get a thread pool working. Here's a minimal reproduction of the issue:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=2623a77d4a421c1c5a9cc80e85b4b181

fn main() {
    let mut threads = Vec::new();
    
    for x in 1..100 {
        threads.push(std::thread::spawn(|| {
            std::thread::sleep(std::time::Duration::from_secs(1));
        }));
        
        if threads.len() >= 5 {
            for i in threads.iter() {
                i.join();
            }
            
            threads.clear();
        }
    }
}

Apparently because JoinHandle cant be cloned, I can't push and iterate on a list of threads like that. Is there a smarter way I should be doing this? Thanks!

EDIT: Thanks to @Kixiron on the rust community discord, and apparently I can just use threads.drain(..) without the clear and that works fine. I'm actually not sure why that works if anyone has a clue to that, but I'm happy it works at all.

That should work if you use a moving iterator like threads.drain(..) instead of threads.iter(), which provides references.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.