Python code fast than the equivalent in Rust

Hey guys! I'm new to Rust and I'm doing some experiments to familiarize myself with the language. I created code in Python and Rust that reads 999,999 integers from a file and sorts and reverses them. But my Python code runs in 1.8 seconds, while the Rust code (compiled with the --release flag) runs in 2.9 seconds. Any tips on how to speed up this code? I tried adding file and stdout read buffers but to no avail.

Python code:

import time
from io import StringIO

st = time.time()

text_stream = StringIO()
f = open("random_numbers.txt", "r")
lines = f.readlines()
numbers = list(map(lambda x : int(x.strip()), lines))
numbers.sort()
numbers.reverse()

for n in numbers:
    text_stream.write(str(n)+"\n")
f.close() 
print(text_stream.getvalue()) 
text_stream.close()

et = time.time()
elapsed_time = et - st
print('Execution time:', elapsed_time, 'seconds')

Rust code:

use std::fs::File;
use std::io::{stdout, Write};
use std::io::BufReader;
use std::io::prelude::*;
use std::time::Instant;

fn main() -> std::io::Result<()> {
    let start = Instant::now();
    let file = File::open("random_numbers.txt")?;
    let mut buf_reader = BufReader::new(file);
    let mut contents = String::new();
    buf_reader.read_to_string(&mut contents)?;
    let mut numbers = contents
        .lines()
        .map(|x| {
            x.parse::<u32>()
                .expect("Not find a u32 number")
        })
        .collect::<Vec<u32>>();
    numbers.sort();
    numbers.reverse();
    let mut lock = stdout().lock();
    for n in numbers {
        writeln!(lock,"{}", n).unwrap();
    }
    let duration = start.elapsed();
    println!("Time elapsed in expensive_function() is: {:?}", duration);
    Ok(())
}

In your Python code, f.readlines() reads one line at a time from the file, while in the Rust code, you read the whole file in at once with buf_reader.read_to_string(). Try using BufRead::lines() instead and see how that goes.

Time each part. You should be figuring out such difference rather than asking others.

(Without doing so and just guessing)
The 999,998 extra system calls from write will have an impact.
Without running so maybe buggy. (Reuse memory added too.)

contents.clear();
for n in numbers {
    writeln!(contents,"{}", n).unwrap();
}
write!(lock,"{}", contents).unwrap();
2 Likes

Thats not true. From the python documentation

If you want to read all the lines of a file in a list you can also use list(f) or f.readlines() .

buf_reader.read_to_string(&mut contents)?; is wasteful and unnecessary, as it's just copying data twice (due to a pointless buffer) or more (due to lack of preallocated capacity in the string). Use contents = std::fs::read_to_string(), which achieves the same thing, but with less code and less copying.

You can use numbers.sort_unstable() for a slight speed boost. sort + reverse is needlessly doing work twice. It could be numbers.sort_unstable_by_key(|n| std::cmp::Reverse(*n)).

You're including file I/O in your measurement. Are you running these programs enough times in different order to eliminate effects of disk caches? The program that runs first will actually read the file from disk, and the program that you run after will most likely get random_numbers.txt from a RAM cache.

Try wrapping lock in BufWriter.

8 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.