Hi, I'm new to Rust and I was trying to get an idea of how Rust will perform for simple I/O tasks. I'm a noob to Rust and don't know much about C++ and Ada but I figured I'd put together a simple file processing task that was still interesting.
The task: read in a single CSV that's about 1.3 million lines long and find the maximum value of a fixed numerical column.
In this case I downloaded some forex historical data from Download Free Forex Historical Data – HistData.com .
I decided I would pick the second column arbitrarily. This corresponds to the bid value of a currency at a given timestamp. Finding the maximum bid value of a currency in a given month seems kind of interesting, but is simple enough to not require much code.
I did basic, naive implementations in Rust, C++ and Ada. This is running on a mid-2015 MacBook Pro.
The Rust code:
use std::io::{BufReader,BufRead};
use std::fs::File;
fn main() {
let file = File::open("DAT_ASCII_EURUSD_T_201809.csv").unwrap();
let mut bid_max: f32 = 0.0;
for line in BufReader::new(file).lines() {
let unwrapped = line.unwrap();
let bid_string = unwrapped.chars().skip(19).take(8).collect::<String>();
let bid: f32 = bid_string.parse().unwrap();
if bid > bid_max {
bid_max = bid;
}
}
println!("{}", bid_max);
}
The C++:
#include <fstream>
#include <string>
#include <iostream>
int main( int argc , char** argv ) {
std::string line;
std::ifstream infile("DAT_ASCII_EURUSD_T_201809.csv") ;
float bid_max = 0.0f;
if (infile) {
while (getline(infile, line)) {
auto bid_str = line.substr(19, 8);
float bid = std::stof(bid_str);
if (bid > bid_max) {
bid_max = bid;
}
}
}
std::cout << bid_max << '\n';
infile.close();
}
And the Ada implementation:
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Float_Text_IO; use Ada.Float_Text_IO;
procedure Line_By_Line is
File : File_Type;
Bid_Start_Index : constant := 20;
Bid_End_Index : constant := 27;
Max_Bid : Float := 0.0;
begin
Open (File => File,
Mode => In_File,
Name => "DAT_ASCII_EURUSD_T_201809.csv");
While not End_Of_File (File) Loop
declare
Line: String := Get_Line (File);
Bid_String: String := Line (Bid_Start_Index .. Bid_End_Index);
Bid_Value: Float := Float'value(Bid_String);
begin
if Bid_Value > Max_Bid then
Max_Bid := Bid_Value;
end if;
end;
end loop;
Put(Max_Bid);
Put_Line("");
Close (File);
end Line_By_Line;
So, on my laptop, using rust 1.30 nightly, this runs in about 0.44s just using cargo build --release
and then running time ./target/release/line_by_line_rs
.
The C++ version, compiled with g++ -march=native -Ofast -std=c++11
and using the clang version built in to Mac OS X (clang --version
says Apple LLVM version 10.0.0 (clang-1000.10.44.2)
), takes about 0.53s.
Ada running the latest Gnat version for Mac takes about the same amount of time as C++.
I'm just running time
on the executables a few times to get an approximate average, nothing super scientific.
I was surprised to see Rust run faster than C++ (which uses LLVM) and Ada (which uses gcc). Are the C++/Ada versions doing work the Rust version is not?
I'm sure if you wanted to make this task run as fast as possible, you could probably do this in a much more complex way (e.g. having a file reader thread that puts lines onto a queue that a processor then processes, reading the file in parallel, etc). In that case, it might be possible to do low-level I/O in C++ that would make it much faster than Rust. What I'm interested in what results I will see as someone with limited time: how fast will the simplest possible version work? How fast will it work after a cursory profiling and maybe an hour or two of work? This is different from typical benchmarks that assume you are freaking Yoda and don't care how ugly your code gets or how long it takes.
I think this simple example, though imprecise and not super meaningful is still interesting and says something about the outcomes people who are not programming gods with unlimited time will see for real-world tasks.
I figured I'd share. Thanks!
Edit: replies show that the above code can definitely be improved:
- The C++ code is doing possibly needless allocation. C++17's
from_chars
or possiblystring_view
could probably improve this in a simple way. - The Rust code is also doing allocation that can be avoided depending on how this code would need to be used. Changing the Rust substring extraction line to
let bid_string = &unwrapped[19..19+8]
cuts runtime approximately in half.