Speed difference - Method call vs Function call

Hello,

I noticed during benchmarks that calling a function was way faster (x20 difference) compared to calling a method doing the exact same thing, in my case a Vector-Matrix multiplication.

What is the reason for this difference, and what can I do to get the method call to be as fast as the function call?

Here is the code for the function

Function Implementation
fn call_function(
    weights: &[[i16; EMBD_SIZE]; LAYER_SIZE], 
    input: &[i16; EMBD_SIZE],
    output: &mut [i16; LAYER_SIZE]) {
        for i in 0..LAYER_SIZE {
            for j in 0..EMBD_SIZE {
                output[i] += input[j] * weights[i][j]
        }
    }
}

And here is the same thing but using a struct,

Method Implementation
pub struct LinearTransform {
    weights: [[i16; EMBD_SIZE]; LAYER_SIZE],
    pub output: [i16; LAYER_SIZE],
}

impl LinearTransform {
    pub fn new() -> Self {
        Self {
            weights: [[0; EMBD_SIZE]; LAYER_SIZE],
            output: [0; LAYER_SIZE],
        }
    }
    
    pub fn forward(&mut self, input: &[i16; EMBD_SIZE]) {
        for i in 0..LAYER_SIZE {
            for j in 0..EMBD_SIZE {
                self.output[i] += input[j] * self.weights[i][j]
            }
        }
    }
}

The timing for 1 million iterations, with a 512 x (512, 32) Matrix product is:

Method time: 6.903119005s
Function time: 315.652135ms

Code for timing:

Timing
use std::time::Instant;

fn main() {
    let mut transform = LinearTransform::new();
    let input = [0; EMBD_SIZE];
    let mut result = 0;
    let start = Instant::now();
    let mut k = 0;
    while k < 1_000_000 {
        transform.forward(&input);
        result += transform.output[0];
        k += 1;
    }
    println!("{}", result);
    let duration = start.elapsed();
    println!("Method time: {:?}", duration);

    let x = [0; EMBD_SIZE];
    let w = [[0; EMBD_SIZE]; LAYER_SIZE];
    let mut y = [0; LAYER_SIZE];
    let start = Instant::now();
    let mut k = 0;
    while k < 1_000_000 {
        call_function(&w, &x, &mut y);
        k += 1;
    }
    println!("{}", y[0]);
    let duration = start.elapsed();
    println!("Function time: {:?}", duration);
}

Thanks !

Try test the function case before the method case and see if the result changed. Performance is sensitive, try dedicated benchmark helper library like the criterion.

https://docs.rs/criterion/0.3.4/criterion/

Thanks for the answer, I already tried both of the things you suggest, and the speed difference remains the same.

Criterion output:

Function                time:   [502.92 ns 504.18 ns 505.37 ns]                       
Found 13 outliers among 100 measurements (13.00%)
  11 (11.00%) low mild
  2 (2.00%) high severe

Method                  time:   [7.8638 us 7.9226 us 7.9818 us]                    
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

I found what the issue was, but I'm not sure to understand why.

Apparently, the compiler wasn't so happy about the &mut self. Replacing it by &self and passing the output as a function parameter gets the method to be as fast as the function call.

impl LinearTransform {
    pub fn forward(&self, 
               input: &[i16; EMBD_SIZE], 
               output: &mut [i16; LAYER_SIZE]) {
        for i in 0..LAYER_SIZE {
                for j in 0..EMBD_SIZE {
                    output[i] += input[j] * self.weights[i][j]
                }
        }
    }
}

I'm still interrested in a workaround to keep the output as a structure attribute somehow :slightly_smiling_face:

Looking at the generated assembly on rust.godbolt.org, it seems like the "meat" of call_function compiles to a repeating sequence of movzx, imul, add, whereas in LinearTransform::forward there is an extra mov in every repetition. I don't know exactly what to make of that -- based on what @lzanini found maybe it's the result of aliasing considerations? -- but someone else probably does.

1 Like

Storing output as a local variable first and then assigning it back to self seems the same or faster as the function version. No idea why.

    pub fn forward(&mut self, input: &[i16; EMBD_SIZE]) {
        let mut output = [0; LAYER_SIZE];
        for i in 0..LAYER_SIZE {
            for j in 0..EMBD_SIZE {
                output[i] += input[j] * self.weights[i][j];
            }
        }
        self.output = output;
    }

:sweat_smile: Indeed I get 20% performance improvement on Criterion creating a local variable.

Method                  time:   [374.48 ns 374.89 ns 375.33 ns]                   
                        change: [-21.318% -20.868% -20.434%] (p = 0.00 < 0.05)
                        Performance has improved.

Very odd. Thanks anyway !

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.