I am trying to convert a Rgba image buffer to RGB image buffer, used in Rust image cargo.
Basically I am converting ImageBuffer<Rgba,Vec> to ImageBuffer<Rgb,Vec>
I have several options like:
Create a new buffer and loop through every pixel to discard the alpha channel.
Convert the old buffer to Dynamic image, use the to_rgb8() method, and convert back to a new image buffer.
Both option seems to affect performance a lot.
So does anyone have experience with this and can maybe shred light on a good method to maximize performance? Thank you.
I don't know which option is the fastest nor do have time to write benchmarks right now, but here's a simple implementation you can compare yours against. It's basically your option 1, but operates directly on bytes instead of pixels.
use image::{Rgb, Rgba};
fn rgba8_to_rgb8(input: image::ImageBuffer<Rgba<u8>, Vec<u8>>) -> image::ImageBuffer<Rgb<u8>, Vec<u8>> {
let width = input.width() as usize;
let height = input.height() as usize;
// Get the raw image data as a vector
let input: &Vec<u8> = input.as_raw();
// Allocate a new buffer for the RGB image, 3 bytes per pixel
let mut output_data = vec![0u8; width * height * 3];
let mut i = 0;
// Iterate through 4-byte chunks of the image data (RGBA bytes)
for chunk in input.chunks(4) {
// ... and copy each of them to output, leaving out the A byte
output_data[i..i+3].copy_from_slice(&chunk[0..3]);
i+=3;
}
// Construct a new image
image::ImageBuffer::from_raw(width as u32, height as u32, output_data).unwrap()
}
Assuming ImageBuffer just stores the pixel data as contiguous bytes in memory, converting RBGARBGARBGA... to RBGRBGRBG... is not a very cheap operation. You need to read and write a lot of memory to do this, especially if your image is large, and it's not just a contiguous copy. It may be possible to optimize the conversion somewhat by using vectors instructions if your platform supports those (e.g. _mm_shuffle_epi8 in core::arch::x86_64 - Rust on x86)
Probably slightly faster is to assign data in this way, instead of manually tracking the index:
// Iterate through 4-byte chunks of the image data (RGBA bytes)
for (output, chunk) in {
output_data.chunks_exact_mut(3).zip(input.chunks_exact(4))
} {
// ... and copy each of them to output, leaving out the A byte
output.copy_from_slice(&chunk[0..3]);
}
However, I'm slightly disappointed. The loop using pixels_mut() and pixels() instead of performing them on the raw sample slice should be equivalent as it internally also defers to chunks_exact{,_mut}. But for one reason or another this fact does not seem to exploited for optimization by llvm. And there is no consistent SIMD use as alluded to by @jethrogb