I am working on a map generation program that creates about half a million png images. The structure is roughly equivalent to this (except that it's also multithreaded):
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use image::{ImageFormat, RgbaImage};
fn spam() {
std::fs::create_dir_all("junk").unwrap();
for i in 0..100 {
let img = black_box(RgbaImage::new(256, 256)); // actual code omitted
// images are made 16 at a time
for s in 0..16 {
let filename = format!("junk/{}_{}.png", i, s);
if black_box(true) {
img.save_with_format(filename, ImageFormat::Png).expect("Error saving.");
}
}
}
}
fn criterion_benchmark(c: &mut Criterion) {
c.bench_function("spam", |b| b.iter(|| spam()));
}
criterion_group! {
name = benches;
// This can be any expression that returns a `Criterion` object.
config = Criterion::default().significance_level(0.1).sample_size(10);
targets = criterion_benchmark
}
criterion_main!(benches);
As it turns out, it spends about 1/3 of the time actually creating images, and 2/3 saving them. Is there something I can do to speed this up?
I expect that saving the images is CPU-bound and spends most of its time in PNG encoding. This should benefit a lot from making it parallel using rayon:
use rayon::prelude::*;
(0..100).into_par_iter().for_each(|i| {
let img = black_box(RgbaImage::new(256, 256)); // actual code omitted
(0..16).into_par_iter().for_each(|s| {
let filename = format!("junk/{}_{}.png", i, s);
if black_box(true) {
img.save_with_format(filename, ImageFormat::Png).expect("Error saving.");
}
});
});
Update: On my laptop, with 4 logical cores and 2 physical cores, this reduces the run time from 2.3 seconds to 1.0 seconds. The improvement should be much greater on a many-core processor.
spam time: [978.93 ms 1.0073 s 1.0610 s]
change: [-57.316% -55.772% -52.350%] (p = 0.00 < 0.10)
Performance has improved.
The CPU usage is around 80% (so, each core is idling about 1/5th the time). I already disabled antivirus scans in the output folder (which was using up 40% or so prior).
@mbrubeck the program is already multithreaded. I'll give the encoder thing a try, although that's probably not worth the tradeoff (the users of the map need to fetch them from the server, so I'd rather have a smaller filesize).