Hi there,

I am doing some matrix multiplications here. It took awfully long and I have no idea if it is me doing wrong or something, so I, a rookie, am here, crying for a little help

Here is a part of my code, in which, `matrix_b`

and `matrix_a`

are two dense matrices of size 500,000 * 300 and of type `ArrayBase`

(from `ndarray`

), and the elements are `f32`

. In order to prevent memory overflow, I did it in batches.

```
for n in (0..limit).step_by(batch_size) {
let mut offset = n * batch_size + batch_size;
if offset > limit{
offset = limit;
}
let batch = matrix_b.slice(s![n..offset, ..]); // This does not take long
let similarity_scores = batch.dot(&matrix_a.t()); // This takes very long... like more than 8 hours
//... some more code here but they don't take long
```

I don't know if it is normal for the calculation to take this long?