Is there a way to make a BufWriter to std output mutable inside a closure?

I am writing a program that writes its output to standard output wrapped in a BufWriter. This is currently how I have the printing step of my code working:

    //  PRINTING OUTPUT
    //  Create handle and BufWriter for writing
    let handle = std::io::stdout();
    let mut buf = BufWriter::new(handle);
    //  Iterate over a DashMap of (key, value) pairs  
    fasta_hash.into_iter().par_bridge().for_each(|(k, values)| {
        //  Convert k-mer bytes to str
        let key: &str = str::from_utf8(&k).unwrap();
        writeln!(buf, ">{}\n{}", values.len(), key).expect("Unable to write data");
        }
    });
    buf.flush().unwrap();

I was wondering whether it's possible to speed this up by iterating through the hashmap using the Rayon (crate) ParallelBridge. If I try that, as is to be expected, I get an error[E0596]: cannot borrow `buf` as mutable, as it is a captured variable in a `Fn` closure .

How might I make the BufWriter mutable so I could run this process in parallel?

Since this closure may be run on multiple threads in parallel, you'd need to wrap your BufWriter in a Mutex or other lock to ensure the writes don't interfere with each other:

    let handle = std::io::stdout();
    let buf = Mutex::new(BufWriter::new(handle));

    fasta_hash.into_iter().par_bridge().for_each(|(k, values)| {
        let key: &str = str::from_utf8(&k).unwrap();
        writeln!(buf.lock().unwrap(), ">{}\n{}", values.len(), key).expect("Unable to write data");
    });

However, this will probably eliminate any benefit of using a parallel iterator here, since the writing will be entirely serialized. It's probably simpler and faster to skip the locking and use a non-parallel loop. Parallelism generally helps for CPU-bound computations, and not so much for I/O (especially I/O that is all going to a single destination).

3 Likes

When trying to speed things up by using parallelism, what you want to do is perform independent steps in parallel and serialize the steps which depend on the same resource.

In this case, the actual printing all depends on a shared BufWriter<Stdout> and as @mbrubeck points out, trying to write to stdout "in parallel" actually means all but one thread will be blocked waiting to be given access to stdout.

With that in mind, what I would do is generate the text to be printed in parallel, then print them sequentially. You might do that by collecting into a Vec<String> and a for-loop or by creating a channel and sending each message down that channel while another thread reads from the channel and prints them.

Amdahl's law is pretty relevant here:

Amdahl's law is often used in parallel computing to predict the theoretical speedup when using multiple processors. For example, if a program needs 20 hours to complete using a single thread, but a one-hour portion of the program cannot be parallelized, therefore only the remaining 19 hours ( p = 0.95) of execution time can be parallelized, then regardless of how many threads are devoted to a parallelized execution of this program, the minimum execution time cannot be less than one hour.

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.