String similarity with strsim-rs + rayon vs python: what am I doing wrong?

Hello dear Rustaceans,

I'm new to Rust and I would like to rewrite some string comparison from Python to Rust.

Basically, I have two lists of strings and for each element from the first list I want to find
the most similar string from the second list together with the similarity value. The python code
with list comprehension is quite straightforward:

import Levenshtein

names_a = ['Johny', 'Jake']
names_b = ['John', 'Jack', 'Martin']

def find_max(a, names_b):
    sim = [Levenshtein.jaro_winkler(a, b) for b in names_b]
    ix = sim.index(max(sim))
    return (a, names_b[ix], sim[ix])

result = [find_max(a, names_b) for a in names_a]

The function Levenshtein.jaro_winkler is implemented in C.

I've tried to rewrite the code in Rust in a naive way using strsim-rs and rayon, but it seems way slower than Python (with C extension) which doesn't even use parallelism.

What am I doing wrong? Can you point me in the right direction?
In github repo I included the python code and the data for my benchmarks.

Thank you for any suggestions

cargo run runs in debug mode. You want cargo run --release in order to enable optimizations.

Add the --release flag to compile with optimizations, like this

hyperfine -s basic -L n 10,100,1000,10000,100000,1000000 'cargo --release run data/names-1.txt data/names-{n}.txt'
hyperfine -s basic -L n 10,100,1000,10000 'cargo run --release data/names-1000.txt data/names-{n}.txt'

Thanks, it's a way faster now. I did use cargo build --release to compile it, but I didn't realized I have to use cargo run --release to actually run it.

Note that cargo run will automatically build your code if needed, so you can just call cargo run --release without calling cargo build --release

2 Likes