Why my Rust code is 10 times slower than Python with Numba?

Hi everyone, I am a beginner at Rust lang.

Because I heard that Rust has better performance than other languages with garbage collection, I tried to rewrite a tiny Python tool with Rust as a practice.

In this tool, there is a simple function, which multiplies each element in two lists one by one and records the maximum result. The python code is shown below:

import time
from numba import njit

@njit
def findMax(a,b):
    print(len(a),len(b))
    maxone=0
    for i in a:
        for j in b:
            if i*j>maxone:
                maxone=i*j

    return maxone

a=[0.5 for i in range(20000)]
b=[0.4 for j in range(20000)]

t0=time.time()
s=findMax(a,b)
t1=time.time()
print(str(t1-t0)+"s res:"+str(s))

As you can see, I use Numba to pre-compile this function into LLVM codes to accelerate it. The running result is:

0.7270545959472656s res:0.2

Without Numba pre-compile, its result is:

16.186195135116577s res:0.2

You can see that the Python with Numba is 20 times faster than the original Python code.

I learnt that Rust lang also compiles its code into LLVM codes, so I think this function should run faster with Rust codes:

use std::time::{SystemTime, UNIX_EPOCH};

fn simple_mul(va:Vec<f64>, vb:Vec<f64>) ->f64{
    println!("{},{}",va.len(),vb.len());
    let mut maxone=0.0;
    for i in va.iter(){
        for j in vb.iter(){
            if i*j>maxone{
                maxone=i*j;
            }
        }
    }
    return maxone;
}

fn main() {
    let va=vec![0.5;20000];
    let vb=vec![0.4;20000];
    let t0=SystemTime::now().duration_since(UNIX_EPOCH)
        .expect("Time went backwards");
    let res= simple_mul(va, vb);
    let t1=SystemTime::now().duration_since(UNIX_EPOCH)
        .expect("Time went backwards");
    print!("{}s,ans {}",t1.as_secs_f64()-t0.as_secs_f64(),res);
}

But the testing result is:

8.995279312133789s,ans 0.2

This result confuses me: the function written in Rust is ten times slower than Python with Numba, and there is not apparent speed advantage of Rust code over the original Python code, either.

I tried to fill the testing lists(vectors) with random numbers(use rand crate) in the testing, or use arrays instead of vectors in Rust, but the result kept the same.

Is there any mistake with my Rust code? Thanks for the help!

The first thing to check is always: did you compile with the release flag?

cargo run --release

9 Likes

I made your code a bit more idiomatic.

use std::time::{SystemTime, UNIX_EPOCH};

fn simple_mul(va: &[f64], vb: &[f64]) -> f64 {
    println!("{},{}", va.len(), vb.len());
    let mut maxone = 0.0;
    for i in va {
        for j in vb {
            if i * j > maxone {
                maxone = i * j;
            }
        }
    }
    maxone
}

fn main() {
    let va = vec![0.5; 20000];
    let vb = vec![0.4; 20000];
    let t0 = SystemTime::now()
        .duration_since(UNIX_EPOCH)
        .expect("Time went backwards");
    let res = simple_mul(&va, &vb);
    let t1 = SystemTime::now()
        .duration_since(UNIX_EPOCH)
        .expect("Time went backwards");
    print!("{}s,ans {}", t1.as_secs_f64() - t0.as_secs_f64(), res);
}

In particular,

  • Rust supports trailing expressions (without semicolon) in blocks and function bodies that are used as return value, hence return maxone; in the last line of a function becomes just maxone
  • You weren’t really doing anything with the va: Vec<f64> and vb: Vec<f64> arguments other than reading them and (implicitly) dropping them in the end. In such a case you’d commonly pass the argument by (shared) reference instead of by value. This would mean something like va: &Vec<f64> and calling the function becomes simple_mul(&va, &vb). However in the case of Vec there is the general practice to avoid &Vec<...> and instead use &[...]—that is, a slice. As you can see in the code above, a reference to a vector like the expression &va, can be implictly converted into a slice.
  • You can iterate over a variable va: Vec by using a for loop like you did for i in va.iter(). This is a read-only iteration. The method iter() is more important only for chaining some Iterator-specific methods; for a for look you could just write for i in &va to get the same thing. In the code above, va is already a shared/immutable reference, so you don’t need (and in fact can’t have) the additional & so it becomes for i in va.

If you want to go more functional/declarative (and use the itertools crate) then you might want to write something like

use itertools::iproduct;
fn simple_mul(va: &[f64], vb: &[f64]) -> f64 {
    println!("{},{}", va.len(), vb.len());
    iproduct!(va, vb).map(|(i, j)| i * j).fold(0.0, f64::max)
}
8 Likes

Also, replacing the SystemTime dance with

let start = std::time::Instant::now();
// ...
println!("{:?}" start.elapsed());

is probably cleaner,

and you don't specifically need itertools here:

// dont use that
fn simple_mul(va: &[f64], vb: &[f64]) -> f64 {
    println!("{:?}", (va.len(), vb.len()));
    va.intoiter()
        .zip(vb.into_iter())
        .map(|(i, j)| i * j)
        .fold(0.0, f64::max)
}
7 Likes

Nope, it’s not supposed to be a zip.

Without itertools it would have to be something like

fn simple_mul(va: &[f64], vb: &[f64]) -> f64 {
    println!("{},{}", va.len(), vb.len());
    va.iter()
        .flat_map(|i| vb.iter().map(move |j| i * j))
        .fold(0.0, f64::max)
}

ah yes, I missed it was combinations

I am not familiar with numba but why you compare Python's s=findMax(a,b) with Rust's simple_mul(va, vb)? Do they calculate the same stuff?

You’re aware that the original post contains the source code for both findMax and simple_mul, aren’t you?

def findMax(a,b):
    print(len(a),len(b))
    maxone=0
    for i in a:
        for j in b:
            if i*j>maxone:
                maxone=i*j

    return maxone
fn simple_mul(va:Vec<f64>, vb:Vec<f64>) ->f64{
    println!("{},{}",va.len(),vb.len());
    let mut maxone=0.0;
    for i in va.iter(){
        for j in vb.iter(){
            if i*j>maxone{
                maxone=i*j;
            }
        }
    }
    return maxone;
}

I don’t know numba either, but from OPs description it sounds like it’s just a way to compile/optimize python code better. Its website says

Numba translates Python functions to optimized machine code at runtime using the industry-standard LLVM compiler library. Numba-compiled numerical algorithms in Python can approach the speeds of C or FORTRAN.

1 Like

Ouch. I am sorry I missed the original function implementation. When I was reading different names confused me and I thought that findMax is built-in numba's function.

So I am now at my computer, and I wanted to see for myself, so I fired up your program, first in debug mode:

% cargo run 
    Finished dev [unoptimized + debuginfo] target(s) in 0.57s
     Running `target/debug/numba-rs`
20000,20000
12.677551984786987s,ans 0.2%

12 seconds, which is 50% longer than on your computer at 8 seconds. Undeterred I try release mode:

% cargo run --release
    Finished release [optimized] target(s) in 0.03s
     Running `target/release/numba-rs`
20000,20000
0.48468995094299316s,ans 0.2%

0.48 seconds which is 33% less than your py-numba version.

6 Likes

Oh, so thanks! I found where is the problem: there is a configuration issue with the "build" instruction in my IDE, which only builds the debug version of the program.

5 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.