Blog: Rust Faster


#1

Wherein Veedrac, teXitoi and I set out to speed up some Rust entries to the benchmarks game.


#2

Nice. But that benchmark game looks to me more like using the most hacky code possible to achieve the most possible speed. That way no one would be able to beat assembler. Does this confirm that assembler is the coolest language of all times?


#3

This “shootout” is a good measure of “if I am prepared to sacrifice anything for performance, how bad would the solution look, and how would it do compared to C?”.

It is not a good measure of “how well does idiomatic code perform?”.


#4

Exactly. Also note that assembly is not benchmarked (there are no Regexp implementations in assembly that I know of, fwiw). Most of the benchmarks are about some part of available libraries, e.g. regexps, hash maps, fast concurrency primitives, etc.


#5

Kind’ve surprised there’s no… you know… benchmark numbers? Some screenshots of their graphs would have been nice. Or at least explain if we’re now faster than Haskell/C in these or what.


#6

The article was about the techniques we used to make it faster. You can go into the PRs for benchmark numbers (which are however to be taken with a grain of salt; as no one of us has a machine that is even remotely comparable to the benchmarksgame server).

I’m considering copying the benchmark numbers into the article, but I don’t have too much time today, so perhaps later.


#7

Is the benchmarks game site already using the new versions?


#8

Unfortunately not. The fasta benchmark is running 3 times as fast as the current one on my machine (and thus has a comfortable lead on both 1 and 4 cores) and has been submitted, but is pending review, fasta_redux gets roughly the same speed (but teXitoi hasn’t even submitted it yet to the benchmarksgame tracker). The others depend quite a lot on the machine, so I’m very interested in the results from the benchmarksgame site.


#9

This “shootout” is a good measure of “if I am prepared to sacrifice
anything for performance, how bad would the solution look, and how would
it do compared to C?”.

I’d disagree with your sentiment at least. The shootout is a pretty great measure of the peak performance of a language, and I think the maintainer mostly does a decent job of weeding out truly unintelligible code - at least, I don’t have much trouble comprehending most of the submissions in the languages I understand.

As a language that is supposed to be ‘blazing fast’, whether Rust can achieve the sort of peak performance that C can is actually really important to us. The shootout has certainly shown up at least a couple of legitimate (and substantial) performance issues with the Rust standard library. I’ve seen the community dismiss the shootout as only important for marketing purposes a couple of times, and I think that’s a mistake.


#10

(which are however to be taken with a grain of salt; as no one of us has a machine that is even remotely comparable to Isaac Guoy’s server).

Yeah, that one is interesting. When I rewrote fasta to make it multi-threaded a while back, my version was fast when given multiple cores or just a single one. I tested on my MBP and on a single-core Linode. When submitted to the benchmarks game I found that the multi-core results were roughly as I expected, but the single-core ones were drastically slower. Really not sure why, and didn’t have time to work it out!


#11

Well, I’ve seen pretty terrible ones that are as far as possible from what an idiomatic program would look like, for example in Haskell.

But you are right, I oversimplified my statement. The lower-level the language, the less this tendency to replace idiomatic code matters since you’re probably using a lower level language to get performance anyway.


#12

The shootout is a pretty great measure of the peak performance of a language

I would strongly disagree with that sentiment in the general case.

Bottom up (since the worst offenders are now first),

  • binary-trees is silly since it measures allocation speed for a case that simply doesn’t exist in real code;
  • thread-ring is basically insane, since nobody ever bottlenecks like that;
  • chameneos-redux’s C++ implementation is ridiculous. The C is not so ridiculous, but you still have the problem that basically every language in the top few spots does something completely different;
  • pidigits tests whether you have bindings to GMP;
  • regex-dna tests a regex engine on a small subset of cases (arguably the first half-acceptable benchmark);
  • k-nucleotide tests who has the best hash table for this particular silly scheme, and they don’t all even do the same thing (eg. Scala precompacts, like my new Rust version);
  • mandelbrot is kind’a OK;
  • reverse-complement would be kind’a OK if not for a few hacky implementations (like the Rust);
  • spectral-norm is kind’a OK;
  • Haskell basically cheats fasta (which is why I copied it);
  • meteor-contest is too short to mean anything at all;
  • fannkuch-redux is probably kind’a OK,
  • n-body is kind’a OK.

So maybe 5/13 are acceptable, and I’d still only use 4 of those. I think if looking at mandelbrot, spectral-norm, fannkuch-redux and n-body you can argue the benches are a reasonable measure of peak performance. However, these cases are also all too small and simple to really be convincing either, nor is it particularly fair (where’s NumPy for Python?).

For what it’s worth, the process of optimizing things to the point of no return is bound to surface unwanted bottlenecks, regardless of the attributes of the thing being optimized.

(Largely I just dislike people taking these benchmarks to represent any real measure of language speed, since it’s a common claim on the interwebz.)


#13

I’d just like to point out that the benchmark game web site provides plenty of words of caution on interpreting the data: http://benchmarksgame.alioth.debian.org/dont-jump-to-conclusions.html


#14

The PR value is what matters here. People are going to brag about the fastest languages on the benchmark game, regardless of whether the benchmarks represent anything. Having Rust on top there scores more points. :smiling_imp:


#15

Nobody actually listens. It’s always “how come Rust is only half the speed of C on the benchmark game?”


#16

That’s why Rust should be on top there. People should complain about C not being as fast as Rust :grinning:


#17

While this can be true, I don’t think this will be true unless we spend some serious effort at improving the benchmarks (as Veedrac currently does). C has enjoyed some decades of compiler research and is very good at optimizing the living hell out of just about everything you throw at it – and the current winning C/C++ entries are clearly the work of performance experts.

So in the benchmarks where Rust is at the top, this is due to

  • even more cleverness on behalf of the benchmark writers (e.g. see Veedrac’s fasta)
  • better implementation of some library (e.g. BurntSushi’s regex)
  • LLVM doing a great job and actually using some of Rust’s guarantees (i.e. we got lucky)

Edit: I’d also like to add that we probably would do even better if a nightly build of Rust was benchmarked. This way, we lack e.g. intrinsics, compiler plugins and some optimizations.


#18

I think that’s an over-harsh assessment (although my own is probably over-generous :slight_smile: ). It seems like 6-7 of the benchmarks are mostly decent (with some caveats) by your points. I agree that (for example) Haskell’s fasta implementation is essentially cheating, but that’s perhaps on oversight on the part of the maintainer that could be corrected by pointing it out.

When I look at the benchmarks game what I see is ‘how fast can I make this language go when I really need to push it?’ I don’t think that’s a great assessment of speed of idiomatic code, but I do still think it’s ‘real’, and important to look at.


#19

I don’t think Veedrac’s assessment is too harsh. Even the site itself disclaims that the results be fit for any purpose / Edit: besides as a damn good starting point for performance discussions – obviously the results don’t translate into real-world application performance. As I wrote on the blog, the results say more about the cleverness and performance-expertise of the people writing the entries than about the relative performance of programming languages.

Even so, this benchmark arms race is a) a fun game and b) good PR for Rust, and that’s why we’re playing.


#20

I don’t mean to be overly argumentative, so I’ll end my bickering on this reply, but I think the overall results speak to both the cleverness of the coders and of the language’s capacity to be bent to such clever solutions. There’s a reason that Java is typically slower on these tests than C is, and it’s not simply that all the Java coders submitting to the benchmark game are idiots - it’s that while Java is pretty damn fast for large real-world applications, it has a much lower capacity for serious optimisation than C does.