Dot product performance issue

naruhodo · October 4, 2019, 12:04pm

Hi there,

I am doing some matrix multiplications here. It took awfully long and I have no idea if it is me doing wrong or something, so I, a rookie, am here, crying for a little help

Here is a part of my code, in which, matrix_b and matrix_a are two dense matrices of size 500,000 * 300 and of type ArrayBase (from ndarray), and the elements are f32. In order to prevent memory overflow, I did it in batches.

for n in (0..limit).step_by(batch_size) {

    let mut offset = n * batch_size + batch_size;

    if offset > limit{
        offset = limit;
    }

    let batch = matrix_b.slice(s![n..offset, ..]); // This does not take long

    let similarity_scores = batch.dot(&matrix_a.t()); // This takes very long... like more than 8 hours
            
    //... some more code here but they don't take long

I don't know if it is normal for the calculation to take this long?

HadrienG · October 4, 2019, 12:25pm

I'm not an ndarray user, but some things here look a bit strange assuming that it has the same semantics as standard Rust arrays. Here's my understanding of the code:

// n will be 0, batch_size, 2*batch_size... and so on until above "limit"
for n in (0..limit).step_by(batch_size) {
    // Therefore, this will be 0 * batch_size + batch_size, 
    //                         batch_size * batch_size + batch_size
    //                         2*batch_size * batch_size + batch_size...
    // ...are you sure you meant that, and not "n + batch_size"?
    let mut offset = n * batch_size + batch_size;

    // Assuming that batch_size is, like, 1000 or something, 
    // and that "limit" is the size of the matrix (500*1000),
    // this will saturate to "limit" on the second iteration.
    // because 1000 * 1000 + 1000 > 500 * 1000
    // (btw, you may want to look up saturating_add)
    if offset > limit{
        offset = limit;
    }

    // Therefore, this slice will contain most of the matrix, not a slice
    // of size "batch_size" as you likely intended.
    let batch = matrix_b.slice(s![n..offset, ..]);

    // ...which is likely why this takes a long time
    let similarity_scores = batch.dot(&matrix_a.t());

system · January 2, 2020, 12:25pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Looking for help understanding Rust's performance vs C++ community	28	6972	November 1, 2019
Performance issue with C-array like computation (2 times worst than naive java) help	48	6140	January 12, 2023
Slower when split in two files help	18	723	January 23, 2020
Performance penalty on array newtype help	3	958	November 17, 2021
Ndarray, stack and heap memory, and overhead help	6	1883	January 12, 2023

Dot product performance issue

Related Topics