 # Speed difference - Method call vs Function call

Hello,

I noticed during benchmarks that calling a function was way faster (x20 difference) compared to calling a method doing the exact same thing, in my case a Vector-Matrix multiplication.

What is the reason for this difference, and what can I do to get the method call to be as fast as the function call?

Here is the code for the function

Function Implementation
``````fn call_function(
weights: &[[i16; EMBD_SIZE]; LAYER_SIZE],
input: &[i16; EMBD_SIZE],
output: &mut [i16; LAYER_SIZE]) {
for i in 0..LAYER_SIZE {
for j in 0..EMBD_SIZE {
output[i] += input[j] * weights[i][j]
}
}
}
``````

And here is the same thing but using a `struct`,

Method Implementation
``````pub struct LinearTransform {
weights: [[i16; EMBD_SIZE]; LAYER_SIZE],
pub output: [i16; LAYER_SIZE],
}

impl LinearTransform {
pub fn new() -> Self {
Self {
weights: [[0; EMBD_SIZE]; LAYER_SIZE],
output: [0; LAYER_SIZE],
}
}

pub fn forward(&mut self, input: &[i16; EMBD_SIZE]) {
for i in 0..LAYER_SIZE {
for j in 0..EMBD_SIZE {
self.output[i] += input[j] * self.weights[i][j]
}
}
}
}
``````

The timing for 1 million iterations, with a 512 x (512, 32) Matrix product is:

``````Method time: 6.903119005s
Function time: 315.652135ms
``````

Code for timing:

Timing
``````use std::time::Instant;

fn main() {
let mut transform = LinearTransform::new();
let input = [0; EMBD_SIZE];
let mut result = 0;
let start = Instant::now();
let mut k = 0;
while k < 1_000_000 {
transform.forward(&input);
result += transform.output;
k += 1;
}
println!("{}", result);
let duration = start.elapsed();
println!("Method time: {:?}", duration);

let x = [0; EMBD_SIZE];
let w = [[0; EMBD_SIZE]; LAYER_SIZE];
let mut y = [0; LAYER_SIZE];
let start = Instant::now();
let mut k = 0;
while k < 1_000_000 {
call_function(&w, &x, &mut y);
k += 1;
}
println!("{}", y);
let duration = start.elapsed();
println!("Function time: {:?}", duration);
}
``````

Thanks !

Try test the function case before the method case and see if the result changed. Performance is sensitive, try dedicated benchmark helper library like the criterion.

https://docs.rs/criterion/0.3.4/criterion/

Thanks for the answer, I already tried both of the things you suggest, and the speed difference remains the same.

Criterion output:

``````Function                time:   [502.92 ns 504.18 ns 505.37 ns]
Found 13 outliers among 100 measurements (13.00%)
11 (11.00%) low mild
2 (2.00%) high severe

Method                  time:   [7.8638 us 7.9226 us 7.9818 us]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
``````

I found what the issue was, but I'm not sure to understand why.

Apparently, the compiler wasn't so happy about the `&mut self`. Replacing it by `&self` and passing the output as a function parameter gets the method to be as fast as the function call.

``````impl LinearTransform {
pub fn forward(&self,
input: &[i16; EMBD_SIZE],
output: &mut [i16; LAYER_SIZE]) {
for i in 0..LAYER_SIZE {
for j in 0..EMBD_SIZE {
output[i] += input[j] * self.weights[i][j]
}
}
}
}
``````

I'm still interrested in a workaround to keep the output as a structure attribute somehow Looking at the generated assembly on rust.godbolt.org, it seems like the "meat" of `call_function` compiles to a repeating sequence of `movzx`, `imul`, `add`, whereas in `LinearTransform::forward` there is an extra `mov` in every repetition. I don't know exactly what to make of that -- based on what @lzanini found maybe it's the result of aliasing considerations? -- but someone else probably does.

1 Like

Storing output as a local variable first and then assigning it back to self seems the same or faster as the function version. No idea why.

``````    pub fn forward(&mut self, input: &[i16; EMBD_SIZE]) {
let mut output = [0; LAYER_SIZE];
for i in 0..LAYER_SIZE {
for j in 0..EMBD_SIZE {
output[i] += input[j] * self.weights[i][j];
}
}
self.output = output;
}
`````` Indeed I get 20% performance improvement on Criterion creating a local variable.

``````Method                  time:   [374.48 ns 374.89 ns 375.33 ns]
change: [-21.318% -20.868% -20.434%] (p = 0.00 < 0.05)
Performance has improved.
``````

Very odd. Thanks anyway !

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.