I am doing some matrix multiplications here. It took awfully long and I have no idea if it is me doing wrong or something, so I, a rookie, am here, crying for a little help
Here is a part of my code, in which, matrix_b and matrix_a are two dense matrices of size 500,000 * 300 and of type ArrayBase (from ndarray), and the elements are f32. In order to prevent memory overflow, I did it in batches.
for n in (0..limit).step_by(batch_size) {
let mut offset = n * batch_size + batch_size;
if offset > limit{
offset = limit;
}
let batch = matrix_b.slice(s![n..offset, ..]); // This does not take long
let similarity_scores = batch.dot(&matrix_a.t()); // This takes very long... like more than 8 hours
//... some more code here but they don't take long
I don't know if it is normal for the calculation to take this long?
I'm not an ndarray user, but some things here look a bit strange assuming that it has the same semantics as standard Rust arrays. Here's my understanding of the code:
// n will be 0, batch_size, 2*batch_size... and so on until above "limit"
for n in (0..limit).step_by(batch_size) {
// Therefore, this will be 0 * batch_size + batch_size,
// batch_size * batch_size + batch_size
// 2*batch_size * batch_size + batch_size...
// ...are you sure you meant that, and not "n + batch_size"?
let mut offset = n * batch_size + batch_size;
// Assuming that batch_size is, like, 1000 or something,
// and that "limit" is the size of the matrix (500*1000),
// this will saturate to "limit" on the second iteration.
// because 1000 * 1000 + 1000 > 500 * 1000
// (btw, you may want to look up saturating_add)
if offset > limit{
offset = limit;
}
// Therefore, this slice will contain most of the matrix, not a slice
// of size "batch_size" as you likely intended.
let batch = matrix_b.slice(s![n..offset, ..]);
// ...which is likely why this takes a long time
let similarity_scores = batch.dot(&matrix_a.t());