2D blur the Rust way

I'd like to implement a simple blur, that is averaging 3x3 pixels in a single pass, without transposition.

What would be the best way to implement that in Rust? I'm looking for solution that would prevent the dreadful off-by-one and out of bounds access, but also wouldn't needlessly do bounds checks for every pixel.

In C it goes like this:

special_case_for_first_row();
for each row {
    special_case_for_first_pixel();
    for each pixel except first and last {
        // FAST PATH
    }
    special_case_for_last_pixel();
}
special_case_for_last_row();

In Rust, with iterators that feel "1D", that's a bit awkward.

I imagine with iterators it would go something like this:

    // chain iterators to duplicate first/last row
    std::iter::once(&vec[0..width]).chain(vec.chunks(width)).chain(std::iter::once(&vec[vec.len()-width..]))      
    .windows(3).flat_map(|rows|{
        // here should be more chaining to duplicate first/last pixel in each row
        return rows[0].windows(3).zip(rows[1].windows(3).zip(rows[2].windows(3)))
            .map(|(a,(b,c))|{
                return a[0]+a[1]+a[2]+
                   b[0]+b[1]+b[2]+
                   c[0]+c[1]+c[2];
            });
    });

but that doesn't actually work, because .windows() doesn't work on iterators. And I'm not sure if I can expect the compiler to figure out how to make optimized inner loop for this, without any bounds checks and with a running average.

Are iterators even a good tool for this problem? Should I write equivalent of the C loops instead? Or maybe writing a custom iterator would help?

1 Like

In my opinion, this is really obfuscating the code. Why not going the C way?

I was hoping for another solution, that's somewhere in between, because the C solution isn't great either.

While the C way looks easy to follow, it's actually very tricky to get right, as you have to get all the special cases right (e.g if width < 3), all partial sums right (off-by-one very easy to make and harder to notice), and the summing/averaging logic is scattered in 9 places (a pain if I wanted to extend it to use convolution/fancier blur kernel).

Is defining something like

fn neighbours<T: Copy + Default>(data: Vec<Vec<T>>, pixel_coordinate: (usize, usize)) -> [[T; 3]; 3]

out of the question for performance reasons?

I always prefer to write a helper function for enumerating neighbors to avoid corner cases in the corners, but I usually do this in Python :slight_smile:

The helper done this way wouldn't be able to avoid bounds checks. Maybe an iterator returning [&[T; 3]; 3] could work performance-wise (it'd transmute the inner arrays in fast case, copy in special case).

I have the very same problem and I also believe there should be a windows()-method for iterators to solve this more elegantly.

As an alternative a clamped_windows()-method ([0, 0, 1], [0, 1, 2], [1, 2, 3], ..., [n-3, n-2, n-1], [n-2, n-1, n-1]) or a truncated_windows()-method ([0, 1], [0, 1, 2], [1, 2, 3], ..., [n-3, n-2, n-1], [n-2, n-1]) would be very handy.

The best workaround for my purposes seems to be a hybrid approach (edge case not implemented yet). I'm using a Vec-of-array as 2D field:

let mut front_field: Vec<[Sample; WIDTH]> = Vec::with_capacity(HEIGHT);

The outer loop calls a row handler, always passing three rows (as slice, as array or as separated arguments). The outer loop could handle the edge case by duplicating the first and/or last row (as you've already mentioned). Performance is not an issue on this level.

For the inner loop I used a classical index iterator, as I had to iterate over multiple arrays simultaneously (zip-iterating them was a bit slower and less readable). The resulting assembly was well optimized and didn't contain any boundary checks. I don't know if this works as well with slices rather than the arrays I used (zip-iterating might be the better choice for slices).

The edge-cases within a row would be solved "the C way" - I'm still looking for a better solution.

Nonetheless, maybe this helps a bit.

I'm trying to write a Windows iterator that returns a slice in normal case, and a slice-from-a-tmp-array on edge cases, but lifetimes for it are complicated :confused:

From what I've read I can't have iterator return a reference to itself:

struct Windows {
    input: &[T],
    tmp: [T;3],
}

impl Iterator for Windows {
    fn next(&mut self) -> Option<&[T]> {
       if special { return Some(&self.tmp) } // Nope :(
       else { Some(self.input) }
    }
}

I've also tried:

pub struct Windows3<'input, 'tmp, T: 'tmp + 'input> {
    input: &'input [T],
    tmp: &'tmp mut [T; 3],
}

impl<'input, 'tmp, T> Iterator for Windows<'input, 'tmp, T> where T: Clone, T: 'input + 'tmp, 'input: 'tmp {
    type Item = &'tmp [T];

    fn next(&mut self) -> Option<Self::Item> {
        if special {    
           return Some(self.tmp);
        } else {
           return Some(self.input);
        }
    }

But none of these pass:

cannot infer an appropriate lifetime for automatic coercion due to conflicting requirements

I've also tried creating a 3rd lifetime that the other two are defined to outlive, but that doesn't help either.

Your iterator with an &mut pointer is unsafe because you could mutate the value behind an immutable pointer.

Have you considered making the type of the iterator just [T; 3]? That dodges all the tricky lifetime issues as long as T is Copy.

I find the 'C' way much easier to read, as soon as you start zipping this and unzipping that I find things get quickly unreadable and unmaintainable. It's what I call "write only code" :slight_smile: I am sure everyone knows debugging is generally harder than writing code, so if we use all our intellectual capability to write code we will be too stupid to debug it.

I did some informal research on this presenting algorithms written in different ways to different people and seeing how long it took them to be able to understand what the code was doing. I posted an example on LtU a while back too. The clear majority of people, including functional programming advocates found the imperative loops easier to read and understand.

My theory is that the functional code involves hidden types. The type returned from 'zip' for example needs to be computed from the types of its arguments. Given a sequence of operations and the invisible type computation becomes harder and it is easier to make mistakes. In the imperative code, there is often a mutable container with a visible type signature, the type of which does not change throughout the whole algorithm. Therefore my advice, for readability and maintainability is to minimise the 'hidden' types in code. Try to change the algorithm as little as possible from its canonical pseudocode description.

I realise this goes against the the fashion in Rust, but I encourage people to test this out for themselves, and decide what is more important, following the fashion, or readability and maintainability. I would also recommend avoiding seeing imperative loops as un-rustic and creating a false dichotomy between the 'C' way and the 'Rust' way. There are algorithms, and writing them in a clear, readable and maintainable way is the most important thing.

3 Likes

Yes, indeed. I was thinking in context of .map(), where such borrowing could have been done safely, but forgot iterators are less restricted than that.

This builds:

fn convolution_3<I, O, F>(input: &[I], output: &mut [O], mut f: F)
where I: Copy, F: FnMut([I; 3], &mut O) {
    f([input[0], input[0], input[1]], &mut output[0]);

    for (window, out) in input.windows(3).zip(&mut output[1..]) {
        f([window[0], window[1], window[2]], out);
    }

    let last = input.len() - 1;
    f([input[last - 1], input[last], input[last]], &mut output[last]);
}

fn triple_convolution_3<I, O, F>(input_rows: [&[I]; 3], output_row: &mut [O], mut f: F)
where I: Copy, F: FnMut([[I; 3]; 3], &mut O) {
    f([[input_rows[0][0], input_rows[0][0], input_rows[0][1]],
       [input_rows[1][0], input_rows[1][0], input_rows[1][1]],
       [input_rows[2][0], input_rows[2][0], input_rows[2][1]]], &mut output_row[0]);

    for (((r0, r1), r2), out) in input_rows[0].windows(3)
        .zip(input_rows[1].windows(3))
        .zip(input_rows[2].windows(3))
        .zip(&mut output_row[1..])
    {
        f([[r0[0], r0[1], r0[2]],
           [r1[0], r1[1], r1[2]],
           [r2[0], r2[1], r2[2]]], out);
    }

    let last = input_rows[0].len() - 1;
    f([[input_rows[0][last - 1], input_rows[0][last], input_rows[0][last]],
       [input_rows[1][last - 1], input_rows[1][last], input_rows[1][last]],
       [input_rows[2][last - 1], input_rows[2][last], input_rows[2][last]]], &mut output_row[last]);
}

pub fn blur(input: &[&[u32]], output: &mut [&mut [u32]]) {
    convolution_3(input, output, |rows, out| {
        triple_convolution_3(rows, out, |square, out| {
            *out = square[0][0] + square[0][1] + square[0][2] +
                   square[1][0] + square[1][1] + square[1][2] +
                   square[2][0] + square[2][1] + square[2][2];
        })
    })
}

I was hoping to use convolution_3 in both dimensions, but the triple zip has to be there in the middle.

Of course the kernel size (3) is very much hard-coded here but it might be possible to make it a parameter, with more loops.

1 Like

I think my point here is: do use iterators and iterator combinators, but it doesn’t have to be everything in a single expression. For example don’t hesitate to write a function and call it twice instead of using Iterator::chain.

As expected, external iterators compose better than internal ones.

(Again, this builds but I haven’t tried it.)

struct Convolution3<'a, T: 'a> {
    slice: &'a [T],
    state: Convolution3State<'a, T>,
}

enum Convolution3State<'a, T: 'a> {
    BeforeFirst,
    Middle(::std::slice::Windows<'a, T>),
    Done,
}

impl<'a, T: Copy> Convolution3<'a, T> {
    fn new(slice: &'a [T]) -> Self {
        Convolution3 {
            slice: slice,
            state: Convolution3State::BeforeFirst,
        }
    }
}


impl<'a, T: Copy> Iterator for Convolution3<'a, T> {
    type Item = [T; 3];

    fn next(&mut self) -> Option<Self::Item> {
        match self.state {
            Convolution3State::BeforeFirst => {
                self.state = Convolution3State::Middle(self.slice.windows(3));
                return Some([self.slice[0], self.slice[0], self.slice[1]])
            }
            Convolution3State::Middle(ref mut windows) => {
                if let Some(window) = windows.next() {
                    return Some([window[0], window[1], window[2]])
                }
            }
            Convolution3State::Done => return None
        }
        // windows.next() returned None
        self.state = Convolution3State::Done;

        let last = self.slice.len() - 1;
        Some([self.slice[last - 1], self.slice[last - 1], self.slice[last]])
    }
}


pub fn blur(input: &[&[u32]], output: &mut [&mut [u32]]) {
    for (in_rows, out_row) in Convolution3::new(input).zip(output) {
        for (((r0, r1), r2), out) in
            Convolution3::new(in_rows[0])
            .zip(Convolution3::new(in_rows[1]))
            .zip(Convolution3::new(in_rows[2]))
            .zip(&mut **out_row)
        {
            *out = r0[0] + r0[1] + r0[2] +
                   r1[0] + r1[1] + r1[2] +
                   r2[0] + r2[1] + r2[2];
        }
    }
}
1 Like

Yeah, I’m cheating here by Copying T values. Luckily both &[_] and u32 are Copy, so it works out in this example.