Efficiently saving a lot of images

I am working on a map generation program that creates about half a million png images. The structure is roughly equivalent to this (except that it's also multithreaded):

use criterion::{black_box, criterion_group, criterion_main, Criterion};
use image::{ImageFormat, RgbaImage};

fn spam() {
    std::fs::create_dir_all("junk").unwrap();

    for i in 0..100 {
        let img = black_box(RgbaImage::new(256, 256)); // actual code omitted

        // images are made 16 at a time
        for s in 0..16 {
            let filename = format!("junk/{}_{}.png", i, s);

            if black_box(true) {
                img.save_with_format(filename, ImageFormat::Png).expect("Error saving.");
            }
        }
    }
}

fn criterion_benchmark(c: &mut Criterion) {
    c.bench_function("spam", |b| b.iter(|| spam()));
}

criterion_group! {
    name = benches;
    // This can be any expression that returns a `Criterion` object.
    config = Criterion::default().significance_level(0.1).sample_size(10);
    targets = criterion_benchmark
}
criterion_main!(benches);

As it turns out, it spends about 1/3 of the time actually creating images, and 2/3 saving them. Is there something I can do to speed this up?

What's the disk throughput you're seeing? Are you actually at 100% CPU? What's the bottleneck? Are you running a scan-on-close virus scanner?

I expect that saving the images is CPU-bound and spends most of its time in PNG encoding. This should benefit a lot from making it parallel using rayon:

use rayon::prelude::*;
(0..100).into_par_iter().for_each(|i| {
    let img = black_box(RgbaImage::new(256, 256)); // actual code omitted

    (0..16).into_par_iter().for_each(|s| {
        let filename = format!("junk/{}_{}.png", i, s);
        if black_box(true) {
            img.save_with_format(filename, ImageFormat::Png).expect("Error saving.");
        }
    });
});

Update: On my laptop, with 4 logical cores and 2 physical cores, this reduces the run time from 2.3 seconds to 1.0 seconds. The improvement should be much greater on a many-core processor.

spam                    time:   [978.93 ms 1.0073 s 1.0610 s]                 
                        change: [-57.316% -55.772% -52.350%] (p = 0.00 < 0.10)
                        Performance has improved.

You could possibly save some CPU time at the expense of disk space by constructing your own PNG encoder and using encoder.set_compression(Compression::Fast).

The CPU usage is around 80% (so, each core is idling about 1/5th the time). I already disabled antivirus scans in the output folder (which was using up 40% or so prior).

@mbrubeck the program is already multithreaded. I'll give the encoder thing a try, although that's probably not worth the tradeoff (the users of the map need to fetch them from the server, so I'd rather have a smaller filesize).