Image Cargo: Quick way to convert ImageBuffer<Rgba<u8>,Vec<u8>> to ImageBuffer<Rgb<u8>,Vec<u8>>

I am trying to convert a Rgba image buffer to RGB image buffer, used in Rust image cargo.
Basically I am converting ImageBuffer<Rgba,Vec> to ImageBuffer<Rgb,Vec>
I have several options like:

  1. Create a new buffer and loop through every pixel to discard the alpha channel.
  2. Convert the old buffer to Dynamic image, use the to_rgb8() method, and convert back to a new image buffer.

Both option seems to affect performance a lot.
So does anyone have experience with this and can maybe shred light on a good method to maximize performance? Thank you.

I strongly doubt there is any way that is faster than your first option.

1 Like

I don't know which option is the fastest nor do have time to write benchmarks right now, but here's a simple implementation you can compare yours against. It's basically your option 1, but operates directly on bytes instead of pixels.

use image::{Rgb, Rgba};

fn rgba8_to_rgb8(input: image::ImageBuffer<Rgba<u8>, Vec<u8>>) -> image::ImageBuffer<Rgb<u8>, Vec<u8>> {
    let width = input.width() as usize;
    let height = input.height() as usize;
    // Get the raw image data as a vector
    let input: &Vec<u8> = input.as_raw();
    // Allocate a new buffer for the RGB image, 3 bytes per pixel
    let mut output_data = vec![0u8; width * height * 3];
    let mut i = 0;
    // Iterate through 4-byte chunks of the image data (RGBA bytes)
    for chunk in input.chunks(4) {
        // ... and copy each of them to output, leaving out the A byte
    // Construct a new image
    image::ImageBuffer::from_raw(width as u32, height as u32, output_data).unwrap()

Assuming ImageBuffer just stores the pixel data as contiguous bytes in memory, converting RBGARBGARBGA... to RBGRBGRBG... is not a very cheap operation. You need to read and write a lot of memory to do this, especially if your image is large, and it's not just a contiguous copy. It may be possible to optimize the conversion somewhat by using vectors instructions if your platform supports those (e.g. core::arch::x86_64::_mm_shuffle_epi8 - Rust on x86)

Probably slightly faster is to assign data in this way, instead of manually tracking the index:

    // Iterate through 4-byte chunks of the image data (RGBA bytes)
    for (output, chunk) in {
    } {
        // ... and copy each of them to output, leaving out the A byte


However, I'm slightly disappointed. The loop using pixels_mut() and pixels() instead of performing them on the raw sample slice should be equivalent as it internally also defers to chunks_exact{,_mut}. But for one reason or another this fact does not seem to exploited for optimization by llvm. And there is no consistent SIMD use as alluded to by @jethrogb