Rust faster than C++ and Ada 2012 on a simple file processing benchmark

Hi, I'm new to Rust and I was trying to get an idea of how Rust will perform for simple I/O tasks. I'm a noob to Rust and don't know much about C++ and Ada but I figured I'd put together a simple file processing task that was still interesting.

The task: read in a single CSV that's about 1.3 million lines long and find the maximum value of a fixed numerical column.

In this case I downloaded some forex historical data from Download Free Forex Historical Data – HistData.com .

I decided I would pick the second column arbitrarily. This corresponds to the bid value of a currency at a given timestamp. Finding the maximum bid value of a currency in a given month seems kind of interesting, but is simple enough to not require much code.

I did basic, naive implementations in Rust, C++ and Ada. This is running on a mid-2015 MacBook Pro.

The Rust code:

use std::io::{BufReader,BufRead};
use std::fs::File;

fn main() {
    let file = File::open("DAT_ASCII_EURUSD_T_201809.csv").unwrap();
    let mut bid_max: f32 = 0.0;
    for line in BufReader::new(file).lines() {
        let unwrapped = line.unwrap();
        let bid_string = unwrapped.chars().skip(19).take(8).collect::<String>();
        let bid: f32 = bid_string.parse().unwrap();

        if bid > bid_max {
            bid_max = bid;
        }
    }

    println!("{}", bid_max);
}

The C++:

#include <fstream>
#include <string>
#include <iostream>
 
int main( int argc , char** argv ) {
   std::string line;
   std::ifstream infile("DAT_ASCII_EURUSD_T_201809.csv") ;
   float bid_max = 0.0f;
   if (infile) {
      while (getline(infile, line)) {
          auto bid_str = line.substr(19, 8);
          float bid = std::stof(bid_str);
          if (bid > bid_max) {
              bid_max = bid;
          }
      }
   }
   
   std::cout << bid_max << '\n';
   infile.close();
}

And the Ada implementation:

with Ada.Text_IO;  use Ada.Text_IO;
with Ada.Float_Text_IO; use Ada.Float_Text_IO;

procedure Line_By_Line is
   File : File_Type;
   Bid_Start_Index : constant := 20;
   Bid_End_Index : constant := 27;
   Max_Bid : Float := 0.0;
begin
   Open (File => File,
         Mode => In_File,
         Name => "DAT_ASCII_EURUSD_T_201809.csv");
   While not  End_Of_File (File) Loop
      declare
         Line: String := Get_Line (File);
         Bid_String: String := Line (Bid_Start_Index .. Bid_End_Index);
         Bid_Value: Float := Float'value(Bid_String);
      begin
         if Bid_Value > Max_Bid then
             Max_Bid := Bid_Value;
         end if;
      end;
   end loop;
 
   Put(Max_Bid);
   Put_Line("");
   Close (File);
end Line_By_Line;

So, on my laptop, using rust 1.30 nightly, this runs in about 0.44s just using cargo build --release and then running time ./target/release/line_by_line_rs.

The C++ version, compiled with g++ -march=native -Ofast -std=c++11 and using the clang version built in to Mac OS X (clang --version says Apple LLVM version 10.0.0 (clang-1000.10.44.2)), takes about 0.53s.

Ada running the latest Gnat version for Mac takes about the same amount of time as C++.

I'm just running time on the executables a few times to get an approximate average, nothing super scientific.

I was surprised to see Rust run faster than C++ (which uses LLVM) and Ada (which uses gcc). Are the C++/Ada versions doing work the Rust version is not?

I'm sure if you wanted to make this task run as fast as possible, you could probably do this in a much more complex way (e.g. having a file reader thread that puts lines onto a queue that a processor then processes, reading the file in parallel, etc). In that case, it might be possible to do low-level I/O in C++ that would make it much faster than Rust. What I'm interested in what results I will see as someone with limited time: how fast will the simplest possible version work? How fast will it work after a cursory profiling and maybe an hour or two of work? This is different from typical benchmarks that assume you are freaking Yoda and don't care how ugly your code gets or how long it takes.

I think this simple example, though imprecise and not super meaningful is still interesting and says something about the outcomes people who are not programming gods with unlimited time will see for real-world tasks.

I figured I'd share. Thanks!

Edit: replies show that the above code can definitely be improved:

  • The C++ code is doing possibly needless allocation. C++17's from_chars or possibly string_view could probably improve this in a simple way.
  • The Rust code is also doing allocation that can be avoided depending on how this code would need to be used. Changing the Rust substring extraction line to let bid_string = &unwrapped[19..19+8] cuts runtime approximately in half.

Just a small note: a better comparison would involve using std::from_chars in the C++ instead of using std::strtof(line.substr()) to avoid a useless std::string allocation.

The function is in C++17 and I am not sure in which stdlib is implemented, but I think it should be considered for a proper comparison.

1 Like

The Rust code seems to be doing more work than the C++ code: The C++ code just heap allocates a copy as @dodomorandi points out. However, the Rust code not only heap allocates a copy but performs the copy by decoding UTF-8 to UTF-32 and then re-encoding UTF-32 to UTF-8.

Additionally, the Rust version checks the whole file for UTF-8 validity but the C++ version does not.

2 Likes

Not to throw him under a bus, but @BurntSushi knows a lot more about benchmarking than I do.

1 Like

Is there a way to avoid that UTF-8 -> UTF-32 -> UTF-8 conversion? Or the validity check you mentioned?

It's pretty amazing that Rust can apparently process a couple million lines a second single-threaded when I've apparently done something rather inefficient.

What you have written is good code that can deal with UTF-8.

If you know the input is only valid ASCII (either by making sure by hand or by validating it in your program), you know that each character is a byte long; so, you can index into the string directly: let bid_string = &unwrapped[19..19+8];. This halves the runtime on my machine. (Note: If you are mistaken and the text was not all ASCII and you slice into a multibyte character, the program will panic.)

I know that cache "warmth" (i.e. how much of the data your program will use is close to the processor at the start of the benchmark) is a big factor that can make your results meaningless.

I think the standard way to deal with this is to "warm up" the cache by running your software 100+ times before starting the timer.

You're right, the benchmarking here is heavily flawed and the code above does needless allocation. I've implemented them so that the maximum comparison cannot be vectorized, etc.

I think the only valid takeaway from this exercise is that Rust and Ada are viable alternatives to C++ for simple bulk I/O tasks like the CSV example I picked here.

I actually think it's okay to compare code with needless allocation, etc! You made it pretty clear that this was a comparison of naive programs implementing the same behavior, and I think that's actually a pretty valuable thing to compare between languages; it shows which languages idiomatically encourage writing performant code.

But I would be interested to know how the results would change if you changed your procedure to run each program ~100 times as a "warmup" before doing each actual benchmark.

1 Like

I'd be interested in a benchmark which uses BurntSushi's csv crate, because that's closer to how I would do it. There are many ways to use this crate, including some very fast ones; it's also SIMD-friendly if you compile for your native CPU as well.

1 Like