Faster Command Line Tools in D/Rust

leonardo · May 25, 2017, 12:11pm

There's an interesting thread about an article about optimizing small software tools written in D language:

So I've written a Rust version of the code:

leonardo · May 26, 2017, 11:20am

I've added a faster version: Plain Text code - 74 lines - codepad

It could be a little faster than the last D version. Timings are welcome.

gregwtmtno · May 26, 2017, 11:30am

I ran your first version and found it was just a tiny tiny bit faster than the final D version from the article. Both came in at about 2.5 seconds on my machine. I did make a couple of attempts to improve your version.

I cheated by reading the whole thing to memory first. This allowed me to use &str instead of String for the keys and eliminated all the related allocations. This saved about a tenth of a second. Then I threw rayon at the problem and that saved a full second! Of course, reading the whole thing to memory first is not appropriate for a command line tool like this because presumably it would have to work with much larger files. I could imagine a conditional that reads the whole thing to memory if it's less than 1GB or so though.

leonardo · May 26, 2017, 11:35am

Thank you. Could you please time my improved version too?

(And then if you want you could also show your timings in that Reddit thread).

With a Map API that helps ( Extend entry API to work on borrowed keys. by cristicbz · Pull Request #1769 · rust-lang/rfcs · GitHub ) this program should be a little faster. Probably there are ways to improve the column assignment too, currently it's just an enumerated loop with two dumb ifs inside...

I agree.

leonardo · May 26, 2017, 11:38am

More blog posts and threads: https://www.reddit.com/r/programming/comments/6dct6e/faster_command_line_tools_in_nim/

gregwtmtno · May 26, 2017, 11:39am

I can time your new one later this evening. It seems my machine is rejecting ssh requests at the moment.

EDIT:

My numbers from yesterday:
Leonardo Rust / cargo rustc --release --bin rust_orig -- -C target-cpu=native
max_key: 2006 sum: 22569013

real 0m2.698s
user 0m2.606s
sys 0m0.083s

Dv4 / ldc2 -release -O max_column_sum_by_key_v4b.d
max_key: 2006 sum: 22569013

real 0m2.700s
user 0m2.588s
sys 0m0.095s

gregwtmtno · May 26, 2017, 2:36pm

It's too bad that what I consider idiomatic code (https://gist.github.com/gregkatz/39ee2be6585c81874f83ca7712fc54f5) takes about a full second longer than than yours.

leonardo · May 26, 2017, 5:05pm

Do you mean compared to the first version?

gregwtmtno · May 26, 2017, 5:36pm

Yes, compared to your first version.

xfix · May 27, 2017, 8:43am

Not too surprising, as linked code does more work. For instance, it allows quoting fields (as in RFC 4180).

carlomilanesi · May 27, 2017, 10:35pm

I created this version:
https://gist.github.com/carlomilanesi/083b12112f860a33c8ae78e7f97c47ec
I think it is faster, mainly because of this:

I used a memory-mapped file, instead of using read_to_end of File.
I used a primitive integer number as a key, instead of a string or a vector of bytes.
I used custom-made field splitting code.
I used custom-made string-to-integer parsing.
In addition, I used ordermap::OrderMap instead of std::collections::HashMap.
I added as dependencies in Cargo.toml:

memmap = ""
num = ""

I build with:

cargo build --release

With my computer, it takes a minimum of 590 ms, instead of a minimum of 1083 ms with the last version by leonardo.
I left as comments the code for loading the file in a vector, instead of using a memory-mapped file, and the code using HashMap instead of OrderMap. Using HashMap the minimum time becomes 758 ms. Reading the whole file beforehand, the minimum time becomes 728 ms. Using both HashMap and File, the minimum time becomes 826 ms.

leonardo · May 27, 2017, 10:55pm

590 ms is very nice, but I think the key-as-string is part of this benchmark...

Edit: I've seen a performance improvement (540 ms on my PC) in your code if I add this to the Cargo.toml:

[profile.release]
opt-level = 3
lto = true
panic = "abort"

Using your memory mapped file in my second version the run-time is 620 ms (keys are still string-like):
http://codepad.org/ihXAaJzx

carlomilanesi · May 28, 2017, 12:45pm

OK, but then isn't it bizarre that all the 10512769 keys of the test file are integer numbers?

I had already that. Of course my computer is not as fast as yours.

With this code, in my computer I get a minimum of 727 ms after ten runs.

leonardo · May 28, 2017, 1:15pm

In the end a benchmark does nothing useful. So every benchmark has some artificial constraints that give it a purpose. I think this benchmark is meant to have string-like keys. Perhaps because it's supposed to work on other input files too that have a column of alphanumeric fields. So for every benchmark there's a need for a person that takes partially arbitrary decisions regarding where to draw a line between a fair solution and an unfair one. I think a fair solution of this problem has string-like keys. Your solution is based on a different assumption. Both are equally valid until someone defines what's fair for this benchmark.

In my third version I've also kept the code short because I think it's more reasonable this way But your solution is interesting and fast.

This seems to accept solutions in other languages too:

Topic		Replies	Views
HashMap performance	25	29239	September 24, 2016
How to improve rust performance to 1.4 seconds? help	31	2445	June 16, 2023
Little Brainfuck interpreter in D/Rust for a benchmark	22	4243	December 29, 2015
A blazingly fast concurrent ordered map announcements	22	1596	February 26, 2026
Success story: new Rustacean beating C perf in first week	12	6149	June 6, 2017

Faster Command Line Tools in D/Rust

Related topics