Dot product performance issue

Hi there,

I am doing some matrix multiplications here. It took awfully long and I have no idea if it is me doing wrong or something, so I, a rookie, am here, crying for a little help :sweat_smile:

Here is a part of my code, in which, matrix_b and matrix_a are two dense matrices of size 500,000 * 300 and of type ArrayBase (from ndarray), and the elements are f32. In order to prevent memory overflow, I did it in batches.

for n in (0..limit).step_by(batch_size) {

    let mut offset = n * batch_size + batch_size;

    if offset > limit{
        offset = limit;
    }

    let batch = matrix_b.slice(s![n..offset, ..]); // This does not take long

    let similarity_scores = batch.dot(&matrix_a.t()); // This takes very long... like more than 8 hours
            
    //... some more code here but they don't take long

I don't know if it is normal for the calculation to take this long?

I'm not an ndarray user, but some things here look a bit strange assuming that it has the same semantics as standard Rust arrays. Here's my understanding of the code:

// n will be 0, batch_size, 2*batch_size... and so on until above "limit"
for n in (0..limit).step_by(batch_size) {
    // Therefore, this will be 0 * batch_size + batch_size, 
    //                         batch_size * batch_size + batch_size
    //                         2*batch_size * batch_size + batch_size...
    // ...are you sure you meant that, and not "n + batch_size"?
    let mut offset = n * batch_size + batch_size;

    // Assuming that batch_size is, like, 1000 or something, 
    // and that "limit" is the size of the matrix (500*1000),
    // this will saturate to "limit" on the second iteration.
    // because 1000 * 1000 + 1000 > 500 * 1000
    // (btw, you may want to look up saturating_add)
    if offset > limit{
        offset = limit;
    }

    // Therefore, this slice will contain most of the matrix, not a slice
    // of size "batch_size" as you likely intended.
    let batch = matrix_b.slice(s![n..offset, ..]);

    // ...which is likely why this takes a long time
    let similarity_scores = batch.dot(&matrix_a.t());
2 Likes