Why is this code so slow?

Hello Everyone, I am new to Rust.
I was making a photomosaic creator. (image made of smaller images as mosaic tiles).
I am not sure why this program runs slow (i know dev built runs slower than release one). It is way too slow.
Any help is really appreciated. Thanks.

Basic Idea:

  1. Loop over all the images in the given directory (via glob expression) and store their path and average rgb value.
  2. Create an empty image (tiles will be pasted here)
  3. Loop over the image by 50x50 tile. Get the average of the region and replace it (paste on empty image) with the closest matching tile from the 1st step.

What can I do to improve it. Is there a better way to do this.

The code used:

use image::DynamicImage;
use image::GenericImageView;
use image::Pixel;
// use image::Rgb;
use glob::glob;

struct Tile {
    path: String,
    avg_rgb: Vec<f32>,
}

pub struct Mosaic {
    img: String,
    tiles_dir: String,
    tiles: Vec<Tile>,
    tile_width: u32,
    tile_height: u32,
}

impl Mosaic {
    pub fn new(img: String, tiles_dir: String) -> Self {
        Self {
            img,
            tiles_dir,
            tiles: vec![],
            tile_width: 50,
            tile_height: 50,
        }
    }

    fn distance(&self, rgb1: &Vec<f32>, rgb2: &Vec<f32>) -> f32 {
        // returns distance between 2 rgb values
        let r: f32 = rgb1[0] - rgb2[0];
        let g: f32 = rgb1[1] - rgb2[1];
        let b: f32 = rgb1[2] - rgb2[2];

        let d: f32 = r.powf(2.0) + g.powf(2.0) + b.powf(2.0);
        d.sqrt()
    }

    fn get_avg_rgb(&self, img: &DynamicImage) -> Vec<f32> {
        let mut avg_r: f32 = 0.0;
        let mut avg_g: f32 = 0.0;
        let mut avg_b: f32 = 0.0;

        let width = img.width();
        let height = img.height();
        let size = (width * height) as f32;

        for w in 0..width {
            for h in 0..height {
                let rgb = img.get_pixel(w, h).to_rgb();
                avg_r += rgb[0] as f32 / size;
                avg_g += rgb[1] as f32 / size;
                avg_b += rgb[2] as f32 / size;
            }
        }

        vec![avg_r, avg_g, avg_b]
    }

    fn get_closest(&self, rgb: &Vec<f32>) -> usize {
        let mut max = f32::MAX;
        let mut index = 0;
        for (i, tile) in self.tiles.iter().enumerate() {
            let d = self.distance(rgb, &tile.avg_rgb);
            if max >= d {
                max = d;
                index = i;
            }
        }
        index
    }

    fn create_tiles(&mut self) {
        let glob_expr = self.tiles_dir.to_owned() + "/*.*g";
        for path in glob(&glob_expr).expect("Can't read Directory") {
            let img_path = path.unwrap().to_str().unwrap().to_string();
            let img = image::open(&img_path).unwrap();
            self.tiles.push(Tile {
                path: img_path,
                avg_rgb: self.get_avg_rgb(&img),
            })
        }
    }

    fn create_mosaic(&mut self) {
        self.create_tiles();
        if self.tiles.len() == 0 {
            println!(
                "Tiles directory '{}' has no images",
                self.tiles_dir
            );
            return;
        }

        println!("{} tiles created", self.tiles.len());

        let n: u32 = 20; // for better output ; n = 5
        let mut base_img = image::open(&self.img).unwrap().thumbnail(100, 100); //for better output, (500, 500)
        let (width, height) = base_img.dimensions();

        let new_width: u32 = width * self.tile_width / n;
        let new_height: u32 = height * self.tile_height / n;
        let mut new_img = DynamicImage::new_rgb8(new_width, new_height);

        for w in (0..width).step_by(n as usize) {
            for h in (0..height).step_by(n as usize) {
                let piece = base_img.crop(w, h, n, n);
                let piece_rgb = self.get_avg_rgb(&piece);
                let index = self.get_closest(&piece_rgb);
                let tile_img = image::open(&self.tiles[index].path)
                    .unwrap()
                    .thumbnail(self.tile_width, self.tile_height);
                let x = w * self.tile_width / n;
                let y = h * self.tile_height / n;
                image::imageops::overlay(&mut new_img, &tile_img, x.into(), y.into());
            }
        }

        match new_img.save("final.png") {
            Ok(_t) => {
                println!("Mosaic created")
            }
            Err(e) => {
                println!("Error occured {}", e)
            }
        }
    }

    pub fn create(&mut self) {
        self.create_mosaic();
    }
}

Just to be sure: are you starting the program using cargo run --release?
Only with the release flag does the compiler optimize the code for performance.

No i am just running via cargo run. its takes more than 10 minute for it to create final.png if n = 5 and thumbnail size is (500, 500). I thinks this is too much slow even for unoptimized version

Create a flamegraph and find out where time is consumed.

It is quite possible for the unoptimised version to be 10 or 100 times slower than the --release build.

What timing do you get in release?

Apparently its quite fast with --release flag.
with n = 10 and thumbnail size (500, 500).
188 tiles and base_img size is (800x800)

unoptimized time is over 20 minutes.
with release flag it only takes 30sec.

So there is nothing to worry about. Right?

If 30 seconds is ok for your use case its fine.

Without --release it's expected to be 20 or 50 times slower. In the default debug mode Rust doesn't care about speed at all, and even does things that are intentionally slow and inefficient in order to make compilation quicker and debugging easier.

If you need more speed during development, you can customize opt-level in Cargo.toml:

https://doc.rust-lang.org/cargo/reference/profiles.html

This may be useful to make image and imageproc faster:

[profile.dev.package."*"]
opt-level = 2
4 Likes

Debug builds do seem to be slow in Rust. It's not something I am going to worry about.

If the 30sec is a worry for you it's time to start profiling. You might want to think about parallelizing it to make use of all the cores on your machine.

Thanks man!!

Thats interesting. thanks for sharing. I will look into it.

My understanding of what the code is doing is this:

  • You have an original image 800x800.
  • You scale it to 500x500
  • Read 188 tile images and compute their averages
  • Split the 500x500 image into 2500 10x10 squares.
  • For each square:
    • find the best tile image
    • re-read the best tile
    • scale it to 10x10
    • paste into the square

In the last step, you're reading a tile image and scaling it 2500 times.

Instead, only read the 188 tile images once, and scale each one to 10x10, so that you don't have to re-read them again and scale them so many times.

Some other details that probably matter less for performance:

  • You are computing pixels as 3-element Vec<f32>. This is not the best type for this. Use image::Rgb<f32>.
  • You're iterating over individual pixels in DynamicImage and calling get_pixel(w, h) for each, which invokes some machinery to figure out what kind of image it is, etc, for each pixel. Instead, iterate over pixels in bulk with something like:
for pixel in img.to_rgb32f().pixels() {
}
  • Instead of dividing each pixel by size, compute the sum of pixels, and divide the sum by size.
10 Likes

Thanks!! I will try that.

This really helped. Now it only takes 2s to run (n=10, thumbnail=500x500, 188 tiles, 800x800 base img size).
Thanks again Everyone.

updated code:

use glob::glob;
use image::imageops::overlay;
use image::imageops::FilterType;
use image::DynamicImage;
use image::GenericImageView;
use image::Rgb;

struct Tile {
    avg_rgb: Rgb<f32>,
    img: DynamicImage,
}

pub struct Mosaic {
    img: String,
    tiles_dir: String,
    tiles: Vec<Tile>,
    tile_width: u32,
    tile_height: u32,
}

impl Mosaic {
    pub fn new(img: String, tiles_dir: String) -> Self {
        Self {
            img,
            tiles_dir,
            tiles: vec![],
            tile_width: 50,
            tile_height: 50,
        }
    }

    fn distance(&self, rgb1: &Rgb<f32>, rgb2: &Rgb<f32>) -> f32 {
        // returns distance between 2 rgb values
        // (like 3d coordinates, r = sqrt(x^2 + y^2 + z^2))
        let r: f32 = rgb1[0] - rgb2[0];
        let g: f32 = rgb1[1] - rgb2[1];
        let b: f32 = rgb1[2] - rgb2[2];

        let d: f32 = r.powf(2.0) + g.powf(2.0) + b.powf(2.0);
        d.sqrt()
    }

    fn get_avg_rgb(&self, img: &DynamicImage) -> Rgb<f32> {
        let mut avg_r: f32 = 0.0;
        let mut avg_g: f32 = 0.0;
        let mut avg_b: f32 = 0.0;

        let width = img.width();
        let height = img.height();
        let size = (width * height) as f32;

        for pixel in img.to_rgb32f().pixels() {
            avg_r += pixel[0] as f32 / size;
            avg_g += pixel[1] as f32 / size;
            avg_b += pixel[2] as f32 / size;
        }

        Rgb::from([avg_r, avg_g, avg_b])
    }

    fn get_closest(&self, rgb: &Rgb<f32>) -> usize {
        let mut max = f32::MAX;
        let mut index = 0;
        for (i, tile) in self.tiles.iter().enumerate() {
            let d = self.distance(rgb, &tile.avg_rgb);
            if max >= d {
                max = d;
                index = i;
            }
        }
        index
    }

    fn create_tiles(&mut self) {
        let glob_expr = self.tiles_dir.to_owned() + "/*.*g";
        for path in glob(&glob_expr).expect("Can't read Directory") {
            let img_path = path.unwrap().to_str().unwrap().to_string();
            let img = image::open(&img_path).unwrap().resize_to_fill(
                self.tile_width,
                self.tile_height,
                FilterType::Nearest,
            );
            self.tiles.push(Tile {
                avg_rgb: self.get_avg_rgb(&img),
                img: img,
            })
        }
    }

    fn create_mosaic(&mut self) {
        self.create_tiles();
        if self.tiles.len() == 0 {
            println!("Tiles directory '{}' has no images", self.tiles_dir);
            return;
        }

        println!("{} tiles created", self.tiles.len());

        let n: u32 = 10;
        let mut base_img = image::open(&self.img).unwrap().thumbnail(500, 500);
        let (width, height) = base_img.dimensions();

        let new_width: u32 = width * self.tile_width / n;
        let new_height: u32 = height * self.tile_height / n;
        let mut new_img = DynamicImage::new_rgb8(new_width, new_height);

        for w in (0..width).step_by(n as usize) {
            for h in (0..height).step_by(n as usize) {
                let piece = base_img.crop(w, h, n, n);
                let piece_rgb = self.get_avg_rgb(&piece);
                let i = self.get_closest(&piece_rgb);
                let x = w * self.tile_width / n;
                let y = h * self.tile_height / n;
                overlay(&mut new_img, &self.tiles[i].img, x.into(), y.into());
            }
        }

        match new_img.save("final.png") {
            Ok(_t) => {
                println!("Mosaic created")
            }
            Err(e) => {
                println!("Error occured {}", e)
            }
        }
    }

    pub fn create(&mut self) {
        self.create_mosaic();
    }
}

```
3 Likes

Wow, 20 minutes down to 2 seconds!

Just go to show that a little thinking about what one is doing can have huge benefits.

In this case I guess the lesson to learn is that it is good to cache ones results rather than recalculate the thing over and over. Not always true but likely is for anything complex. And assuming one has the memory available to cache those results.

1 Like