Speed difference - Method call vs Function call

lzanini · February 1, 2021, 2:32am

Hello,

I noticed during benchmarks that calling a function was way faster (x20 difference) compared to calling a method doing the exact same thing, in my case a Vector-Matrix multiplication.

What is the reason for this difference, and what can I do to get the method call to be as fast as the function call?

Here is the code for the function

Function Implementation

fn call_function(
    weights: &[[i16; EMBD_SIZE]; LAYER_SIZE], 
    input: &[i16; EMBD_SIZE],
    output: &mut [i16; LAYER_SIZE]) {
        for i in 0..LAYER_SIZE {
            for j in 0..EMBD_SIZE {
                output[i] += input[j] * weights[i][j]
        }
    }
}

And here is the same thing but using a struct,

Method Implementation

pub struct LinearTransform {
    weights: [[i16; EMBD_SIZE]; LAYER_SIZE],
    pub output: [i16; LAYER_SIZE],
}

impl LinearTransform {
    pub fn new() -> Self {
        Self {
            weights: [[0; EMBD_SIZE]; LAYER_SIZE],
            output: [0; LAYER_SIZE],
        }
    }
    
    pub fn forward(&mut self, input: &[i16; EMBD_SIZE]) {
        for i in 0..LAYER_SIZE {
            for j in 0..EMBD_SIZE {
                self.output[i] += input[j] * self.weights[i][j]
            }
        }
    }
}

The timing for 1 million iterations, with a 512 x (512, 32) Matrix product is:

Method time: 6.903119005s
Function time: 315.652135ms

Code for timing:

Timing

use std::time::Instant;

fn main() {
    let mut transform = LinearTransform::new();
    let input = [0; EMBD_SIZE];
    let mut result = 0;
    let start = Instant::now();
    let mut k = 0;
    while k < 1_000_000 {
        transform.forward(&input);
        result += transform.output[0];
        k += 1;
    }
    println!("{}", result);
    let duration = start.elapsed();
    println!("Method time: {:?}", duration);

    let x = [0; EMBD_SIZE];
    let w = [[0; EMBD_SIZE]; LAYER_SIZE];
    let mut y = [0; LAYER_SIZE];
    let start = Instant::now();
    let mut k = 0;
    while k < 1_000_000 {
        call_function(&w, &x, &mut y);
        k += 1;
    }
    println!("{}", y[0]);
    let duration = start.elapsed();
    println!("Function time: {:?}", duration);
}

Thanks !

Hyeonu · February 1, 2021, 2:35am

Try test the function case before the method case and see if the result changed. Performance is sensitive, try dedicated benchmark helper library like the criterion.

lzanini · February 1, 2021, 2:42am

Thanks for the answer, I already tried both of the things you suggest, and the speed difference remains the same.

Criterion output:

Function                time:   [502.92 ns 504.18 ns 505.37 ns]                       
Found 13 outliers among 100 measurements (13.00%)
  11 (11.00%) low mild
  2 (2.00%) high severe

Method                  time:   [7.8638 us 7.9226 us 7.9818 us]                    
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

lzanini · February 1, 2021, 2:59am

I found what the issue was, but I'm not sure to understand why.

Apparently, the compiler wasn't so happy about the &mut self. Replacing it by &self and passing the output as a function parameter gets the method to be as fast as the function call.

impl LinearTransform {
    pub fn forward(&self, 
               input: &[i16; EMBD_SIZE], 
               output: &mut [i16; LAYER_SIZE]) {
        for i in 0..LAYER_SIZE {
                for j in 0..EMBD_SIZE {
                    output[i] += input[j] * self.weights[i][j]
                }
        }
    }
}

I'm still interrested in a workaround to keep the output as a structure attribute somehow

cole-miller · February 1, 2021, 3:01am

Looking at the generated assembly on rust.godbolt.org, it seems like the "meat" of call_function compiles to a repeating sequence of movzx, imul, add, whereas in LinearTransform::forward there is an extra mov in every repetition. I don't know exactly what to make of that -- based on what @lzanini found maybe it's the result of aliasing considerations? -- but someone else probably does.

dupdrop · February 1, 2021, 3:07am

Storing output as a local variable first and then assigning it back to self seems the same or faster as the function version. No idea why.

    pub fn forward(&mut self, input: &[i16; EMBD_SIZE]) {
        let mut output = [0; LAYER_SIZE];
        for i in 0..LAYER_SIZE {
            for j in 0..EMBD_SIZE {
                output[i] += input[j] * self.weights[i][j];
            }
        }
        self.output = output;
    }

lzanini · February 1, 2021, 3:11am

Indeed I get 20% performance improvement on Criterion creating a local variable.

Method                  time:   [374.48 ns 374.89 ns 375.33 ns]                   
                        change: [-21.318% -20.868% -20.434%] (p = 0.00 < 0.05)
                        Performance has improved.

Very odd. Thanks anyway !

system · May 2, 2021, 3:11am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Why `Box<dyn Fn()>` is the same fast as normal `fn` help	7	793	October 2, 2023
Looking for help understanding Rust's performance vs C++ community	28	6967	November 1, 2019
Why is this code much slower than Java? help	14	1301	April 15, 2021
Slower when split in two files help	18	723	January 23, 2020
Why is there such a huge speed difference between these QuadTree implementations? help	4	2802	December 16, 2019

Speed difference - Method call vs Function call

Related Topics