Vectorized I/O from sequential output

So.. I have this sequential input --> parallel processing --> sequential output, as I may not be able to fit the whole input in memory.

pariter::scope(|scope| {
infileiter
.map()
.for_each(|x: String| (&outfile).write_all(x.as_bytes()).unwrap())
})
.expect("pariter trouble");

Because the iter can produce hundreds of thousands or millions of items, I would like to replace write_all with a little bit of chunking so I don't do an I/O for each item from the iter.

I gave BufWriter a go but it requires mut, and I can't use that within the closure.
It would be nice to be able to use write_vectored or write_all_vectored.
For vectored write, there are 2 problems to solve:
1 - Convert String to IoSlice
2 - take like 10,000 Strings (or up to some mem limit I can set) and write them out.

Is there any existing pattern for doing this?
Or is something wild like shipping each item to a channel, then an rx channel fills up some buffer and writes it out... possible?

Thanks in advance for your time & guidance.

Why not?

Can you share more of your code? The following works for me:

use std::io::Write;
use std::io::BufWriter;


fn main() {
    let mut buf_writer = BufWriter::new(Vec::<u8>::new());  // the Vec could also be a file

    vec![String::new()] // infileiter
        .into_iter()
        .map(|x| x) // do some mapping
        .for_each(|x: String| buf_writer.write_all(x.as_bytes()).unwrap())
}

@H2CO3, @RobinH

Sorry, I've updated the first post with the relevant bit (at the start).

I only see two lines added to the code. It would be useful to have some explanation as to why you think you can't use mut in a closure.

My mistake again.

I had it like this in my code

.for_each(|x: String| (&outfile).write_all(x.as_bytes()).unwrap())

whereas in the example at the top, I put it as

.for_each(|x: String| outfile.write_all(x.as_bytes()).unwrap())

... I'm unable to reproduce the error now :sweat_smile:.
This works ok.

.for_each(|x: String| outbuf.write_all(x.as_bytes()).unwrap())

Any suggestions on how to vectorize it?
I have this, but it's vectorizing each item/line, which isn't that helpful.

.for_each(|x| { _ = (&output_json).write_vectored(&[IoSlice::new(x.as_bytes())]).expect("hey now!") })

&output_json is just a file here, as I was trying the above before I could get BufWriter working.

Any thoughts on this, please?

It would be nice to be able to use write_vectored or write_all_vectored.
For vectored write, there are 2 problems to solve:
1 - Convert String to IoSlice
2 - take like 10,000 Strings (or up to some mem limit I can set) and write them out.

Is there any existing pattern for doing this?
Or is something wild like shipping each item to a channel, then an rx channel fills up some buffer and writes it out... possible?

Thanks in advance for your time & guidance.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.