How to solve reshape with `into_shape` error after `from_shape_ptr`?

In Rust I'm trying to implement a 2D convolution (e.g. image blurring) with a 256x256 grey-image and a 3x3 kernel using the ndarray crate for all matrix operations. I'm vectorising the algorithm to speed up and parallelise the process as instructed in https://towardsdatascience.com/how-are-convolutions-actually-performed-under-the-hood-226523ce7fbf

For the convolution with a 2x2 kernel I need to rewrite:

[[ 1, 2, 3, 4],
 [ 5, 6, 7, 8],
 [ 9,10,11,12]]

type:

ndarray::ArrayBase<ndarray::OwnedRepr<i32>, ndarray::Dim<[usize; 2]>>

to:

[[ 1, 2, 3, 5, 6, 7],
 [ 2, 3, 4, 6, 7, 8],
 [ 5, 6, 7, 9,10,11],
 [ 6, 7, 8,10,11,12]];

With from_shape_ptr :

let ptr = image.as_ptr();
unsafe {
    let view = ArrayView::from_shape_ptr((6,2,2).strides((1,4,1)), ptr);
}

I get:

[[[1, 2],
  [5, 6]],

 [[2, 3],
  [6, 7]],

 [[3, 4],
  [7, 8]],

 [[4, 5],
  [8, 9]],

 [[5, 6],
  [9, 10]],

 [[6, 7],
  [10, 11]]]

I was hoping to use into_shape to get the 6x4 matrix from above:

let review = view.into_shape((6,4)).unwrap();

Alas the into_shape fails with the following error:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ShapeError/IncompatibleLayout: incompatible memory layout'

So the question is: how can reshape the output of from_shape_ptr from (6,2,2) to (6, 4) vectorising the inner two dimensions?

Is this right? It looks like you are doing im2col, but the output, even as 6x2x2, looks wrong.

Quoting ndarray::ArrayBase - Rust

Transform the array into shape ; any shape with the same number of elements is accepted, but the source array or view must be in standard or column-major (Fortran) layout.
Errors if the shapes don’t have the same number of elements.
Errors if the input array is not c- or f-contiguous.

I suspect the most likely outcome is that the 2x2 part of 6x(2x2) is not contiguous, i.e. it's not 4 adjacent f32's.

Could you use ndarray::ArrayBase - Rust to print out the strides? That would test the hypothesis above.

Thanks for your quick reply!

You are indeed right that this window [[4,5],[8,9]] is wrong. I totally missed that. And also your remark about the 2x2 part not being contiguous is right on top and indeed the reason into_shape will not work.

Because of this I need to rethink the use of from_shape_ptr and maybe go for the windows solution:

let mut data: Vec<i32> = Vec::new();
for win in image.windows([2,2]) {
    data.extend(Array::from_iter(win.iter()).to_vec());
}
let im2col = Array2::from_shape_vec((6, 4), data).unwrap();
let kernel_flatten = Array::from_vec(kernel.into_raw_vec());
let conv = &im2col.dot(&kernel_flatten);
let result = conv.view().into_shape((2,3)).unwrap();
println!("{:?}", result);

with a simple identity kernel [[1,0],[0,0]] gives:

[[1, 2, 3],
 [5, 6, 7]]

But I'm afraid this windows solution is expensive because of the extra for-loop and the mem-copy inside the loop.

Could there be another way to create the right view?

Since you're using from shape ptr you have the full possibilities of ndarray's layout model available (excluding negative strides, but those are not needed). Into shape's limitations thus shouldn't matter, even if we are working on improving it.

Can you create the array you need with from shape ptr? Why not? If you can't, it's likely not an array view that we can represent.

Initially for my problem the output of from_shape_ptr looked perfect for creating a sliding window (with the kernel size) over my image. But, as @zeroexcuses pointed out, it also (as it should) creates [[4,5],[8,9]] and this is not a valid window.

I agree with you that from_shape_ptr cannot create the view I need for my problem. Therefor I'm now looking at windows as a possible solutions, although this will result in an extra loop and extra memory (compared to a view solution).

Are you doing something deep learning related? Are you throwing a batch of convolution filters at it ?

Suppose we start a a nxn image, run M kxk convolutions at the same time, then we end up with a sgemm which is:

im2col matrix: (n-k+1)(n-k+1) by kk
kernel matrix: k*k by M

where M here is often something like 32, 64, 128, ...

So, at this point, the sgemm between the two matrices above is likely to completely dominate whatever time it takes to construct the im2col matrix, and we want to do this extra copying because having the right memory layout will speed up the sgemm

I'm using the convolution for image processing as a test-case for Rust. I want to speed up a webinterface now completely written in javascript using WASM. Furthermore I'm looking into the possibilities of using Rust code on our cluster form image processing.

I did time the different parts of the code and you are right that the extra for-loop is not very expensive. Actually I ended up implementing the convolution in FFT with rustfft which gave a an immense speedup using large images.

Thanks again for all the help en suggestions!