Faster Command Line Tools in D/Rust


#1

There’s an interesting thread about an article about optimizing small software tools written in D language:

So I’ve written a Rust version of the code:


#2

I’ve added a faster version: http://codepad.org/eo1aRfSc

It could be a little faster than the last D version. Timings are welcome.


#3

I ran your first version and found it was just a tiny tiny bit faster than the final D version from the article. Both came in at about 2.5 seconds on my machine. I did make a couple of attempts to improve your version.

I cheated by reading the whole thing to memory first. This allowed me to use &str instead of String for the keys and eliminated all the related allocations. This saved about a tenth of a second. Then I threw rayon at the problem and that saved a full second! Of course, reading the whole thing to memory first is not appropriate for a command line tool like this because presumably it would have to work with much larger files. I could imagine a conditional that reads the whole thing to memory if it’s less than 1GB or so though.


#4

Thank you. Could you please time my improved version too? :slight_smile:

(And then if you want you could also show your timings in that Reddit thread).

With a Map API that helps ( https://github.com/rust-lang/rfcs/pull/1769 ) this program should be a little faster. Probably there are ways to improve the column assignment too, currently it’s just an enumerated loop with two dumb ifs inside…

I agree.


#5

More blog posts and threads: https://www.reddit.com/r/programming/comments/6dct6e/faster_command_line_tools_in_nim/


#6

I can time your new one later this evening. It seems my machine is rejecting ssh requests at the moment. :unamused:

EDIT:

My numbers from yesterday:
Leonardo Rust / cargo rustc --release --bin rust_orig – -C target-cpu=native
max_key: 2006 sum: 22569013

real 0m2.698s
user 0m2.606s
sys 0m0.083s

Dv4 / ldc2 -release -O max_column_sum_by_key_v4b.d
max_key: 2006 sum: 22569013

real 0m2.700s
user 0m2.588s
sys 0m0.095s


#7

It’s too bad that what I consider idiomatic code (https://gist.github.com/gregkatz/39ee2be6585c81874f83ca7712fc54f5) takes about a full second longer than than yours.


#8

Do you mean compared to the first version?


#9

Yes, compared to your first version.


#10

Not too surprising, as linked code does more work. For instance, it allows quoting fields (as in RFC 4180).


#11

I created this version:


I think it is faster, mainly because of this:

  1. I used a memory-mapped file, instead of using read_to_end of File.

  2. I used a primitive integer number as a key, instead of a string or a vector of bytes.

  3. I used custom-made field splitting code.

  4. I used custom-made string-to-integer parsing.
    In addition, I used ordermap::OrderMap instead of std::collections::HashMap.
    I added as dependencies in Cargo.toml:

    memmap = “"
    num = "

I build with:

cargo build --release

With my computer, it takes a minimum of 590 ms, instead of a minimum of 1083 ms with the last version by leonardo.
I left as comments the code for loading the file in a vector, instead of using a memory-mapped file, and the code using HashMap instead of OrderMap. Using HashMap the minimum time becomes 758 ms. Reading the whole file beforehand, the minimum time becomes 728 ms. Using both HashMap and File, the minimum time becomes 826 ms.


#12

590 ms is very nice, but I think the key-as-string is part of this benchmark…

Edit: I’ve seen a performance improvement (540 ms on my PC) in your code if I add this to the Cargo.toml:

[profile.release]
opt-level = 3
lto = true
panic = “abort”

Using your memory mapped file in my second version the run-time is 620 ms (keys are still string-like):
http://codepad.org/ihXAaJzx


#13

OK, but then isn’t it bizarre that all the 10512769 keys of the test file are integer numbers?

I had already that. Of course my computer is not as fast as yours.

With this code, in my computer I get a minimum of 727 ms after ten runs.


#14

In the end a benchmark does nothing useful. So every benchmark has some artificial constraints that give it a purpose. I think this benchmark is meant to have string-like keys. Perhaps because it’s supposed to work on other input files too that have a column of alphanumeric fields. So for every benchmark there’s a need for a person that takes partially arbitrary decisions regarding where to draw a line between a fair solution and an unfair one. I think a fair solution of this problem has string-like keys. Your solution is based on a different assumption. Both are equally valid until someone defines what’s fair for this benchmark.

In my third version I’ve also kept the code short because I think it’s more reasonable this way :slight_smile: But your solution is interesting and fast.

This seems to accept solutions in other languages too: