Review of slides for talk about Rust performance

Hi colleagues.

I am preparing slides on performance aspects of Rust and I would be grateful if you could review and share feedback on my presentation. Together with my colleague, Zakhar Akimov, we have been preparing this talk for quite some time so hopefully content should not be too boring.

The slides themselves are here: rust-slides/EN.pdf at main · yugr/rust-slides · GitHub

The GitHub - yugr/rust-slides · GitHub repo contains our analysis (links to analyzed materials, code examples, LLVM plugins, prooflinks, benchmarking framework, etc.). In GitHub - yugr/rust-private: Empowering everyone to build reliable and efficient software. · GitHub you can find compiler patches which we used to disable various runtime checks, etc. (they are linked from presentation as well).

Although we do like the language, the talk is neutral on Rust - we tried to just share our findings without any bias.

Best, yugr

On slide 2, under “Disadvantages” it says “Forced Initialisation”.

You don’t have to initialise newly allocated memory, so I was wondering what this refers to.

Some non-technical observations:

  • Who is the target audience of this talk? The slides are mostly in English, but with words in Cyrillic sprinkled all over it.
  • I stopped skimming the slides at slide no. 90. How long is this talk gonna be? I usually plan ~1 minute per slide when I prepare a talk, and I usually fall asleep on technical talks > 1h.
3 Likes

Who is the target audience of this talk?
The slides are mostly in English, but with words in Cyrillic sprinkled all over it.

Thanks for spotting, hopefully updated version should be English-only now. Original slides were in Russian (you can find RU.pdf file in repo).

How long is this talk gonna be? I usually plan ~1 minute per slide when I prepare a talk,
and I usually fall asleep on technical talks > 1h.

I will reduce the final slides to fit my slot but I wanted to make full version available as well. Actually towards the end I mention that some less important features remain uncovered )

I hope later slides (12-13 and 58-74) will answer that question.

There is an outstanding issue I came across where the compiler initialises MaybeUninit when it is not needed: `MaybeUninit::uninit` zero initializes large uninit array besides small other field · Issue #152541 · rust-lang/rust · GitHub

1 Like

Disclaimer: I find it quite hard to review slides without hearing the accompanied talk, so take this with a grain of salt. I'll also comment as I read through it.

It's always tricky talking about advantages/disadvantages, because the context is important; the same thing can be both an advantage or a disadvantage :slight_smile: I think that it is better to talk about performance in terms of some axes of trade-offs and fundamental limits. The limits of Rust and C and C++ are essentially same; what differs are defaults and the convenience and time required to achieve certain performance goals.

More conservative standard library

Kinda interested in what do you mean by this one! I certainly ran into situations where the stdlib was too high-level and didn't allow me to tune things for maximum performance, at that point I have to switch to a third-party library or a manual solution. Not sure how this could be solved in any language's stdlib though.

• Inconvenient support for self-referential data structures (e.g. graphs)
• Signed integer overflows are allowed

You can use raw pointers and unchecked_add in std::intrinsics - Rust. So the inconvenience axis is "having to use more characters to write unsafe" (Rust) vs "making it easier to cause UB by accident" (C, C++).

That being said, it is likely not the goal of your talk to dispute these bullet points, so I'll stop there :slight_smile:

• Rust has no options to disable safety checks

I find this a bit misleading; there is nothing in surface Rust that prevents you from doing anything you want without any runtime checks, as long as you use enough unsafe (you can't easily turn off all compile-time checks though), modulo maybe some missed optimizations or compiler bugs.

Ok, I see that you mention that with unsafe everything is possible (I agree).

Still, I'm not sure if it makes sense to talk about using typical/idiomatic Rust code when we talk about using only the standard library. IMO the single biggest performance benefit of Rust over C++ is that you can very easily include libraries, which allows you to get to a certain performance level faster, without spending time debugging UB, learning how to build a library or understanding how to use it correctly. It also allows you to experiment with many different optimization approaches quickly by just swapping libraries.

• Uses more conservative algorithms in stdlib

I'm not honestly sure how it's possible, but the conservative sorting algorithm in Rust's stdlib, which even checks for invalid orderings by default, is still faster than C++'s stdlib sort implementations :slight_smile:

Slide 39 is very interesting. If I read it right, the biggest win was 6%? That's pretty awesome for the smartness of LLVM and shows low cost of bound checks in these scenarios. Would be interested in the performance of rustc itself while compiling, if bound checks are removed from it! The unused write analysis is also very interesting. I tried to do something similar on the Rust (not LLVM) level some time ago; similar statistics could be useful to detect useless initializations (but also useless Drops etc.) to optimize the compiler.

Re slide 109: I think it would be very interesting to see some numbers/benchmarks that would confirm that making signed integer overflow UB is actually important for perf. "in practice" (i.e. not on SPEC2000, though it would be interesting to see the effect even there :slight_smile: ). If I'm not mistaken, that's slide 112? Though Disable mandatory overflow and other arithmetic checks. · yugr/rust-private@858aaa7 · GitHub seems "too weak", surely that's not all the cases of checked arithmetics in the stdlib; it would be better to modify rustc to tell LLVM to just treat all signed integer overflows as undefined, rather than to change stdlib code.

In general, I love the idea of modifying the compiler to produce performance baselines, though I worry that some of the modifications might not be going all the way or showing what you wanted to demonstrate, simply because it is tricky in general and it's quite difficult to compare between rustc and clang; they hold LLVM in quite different ways sometimes.

The talk sounds very interesting and detailed, you clearly did your research! It would be a great learning resource even as a website or a blog post. Really, this is a seriously impressive piece of work. Would love to hear this talk or at least a recording of it :slight_smile: The examples look very interesting, I think that for the right audience, it might be a talk that teaches them many useful new things.

One additional nitpick: I probably don't understand the charts very well. The differences in timings don't seem so large as to require a logarithmic Y axis, and I don't understand what is "% change". Does "bar goes up" mean it's slower or faster? I thought it's the former, but then on slide 146 you claim that almost everything is up to 5% slower when you disable all checks? That sounds weird :slight_smile:

3 Likes

Disclaimer: I find it quite hard to review slides without hearing the accompanied talk, so take this with a grain of salt.

Right. The talk will have it's own disadvantages though (apart from it being in Russian I'll have to remove at least 20% of slides to fit the slot).

It's always tricky talking about advantages/disadvantages, because the context is important;
the same thing can be both an advantage or a disadvantage
I think that it is better to talk about performance in terms of some axes of trade-offs and fundamental limits.
The limits of Rust and C and C++ are essentially same;
what differs are defaults and the convenience and time required to achieve certain performance goals.

I totally agree and I try to touch on this in intro section. Perhaps it could be done more strongly but this also risks raising overly heated debates (my talk would be at a local C++ conf)...

More conservative standard library

Kinda interested in what do you mean by this one!
I certainly ran into situations where the stdlib was too high-level and
didn't allow me to tune things for maximum performance,
at that point I have to switch to a third-party library or a manual solution.
Not sure how this could be solved in any language's stdlib though.
...

Uses more conservative algorithms in stdlib

I'm not honestly sure how it's possible, but the conservative sorting algorithm in Rust's stdlib,
which even checks for invalid orderings by default, is still faster than C++'s stdlib sort implementations

I'm aware of several cases where stdlib chooses reliability/safety over performance (they are also listed in "Standard library" section towards the end of the slides):

  • sorting algorithm being resistant against bad comparators (yes, it still manages to be faster than STL)
  • DoS-resistant default hashing algorithm (SipHash)
  • crypto-secure PRNG
  • UTF-8 checks in string containers

and "conservative" here means just those. But I think you are right and it sounds too broad and categorical. I will rephrase.

You can use raw pointers and unchecked_add in std::intrinsics - Rust.
So the inconvenience axis is "having to use more characters to write unsafe" (Rust) vs "making it easier to cause UB by accident" (C, C++).

True but I don't have enough experience to argue with people when and how much unsafe is enough and whether using unsafe defeats the purpose of safe language and all other cliches you usually read on forums :slight_smile: So in these slides I just focused on idiomatic code in hope that it will be easier to prepare better and more precise talks later.

Rust has no options to disable safety checks

I find this a bit misleading; there is nothing in surface Rust that prevents you from doing anything you want without any runtime checks,
as long as you use enough unsafe (you can't easily turn off all compile-time checks though), modulo maybe some missed optimizations or compiler bugs.

Oh, I should have written it better. I only meant that compiler does not provide some big-hammer flags to disable safety checks globally (which is understandable and good) so they had to be disabled via manual patches.

Still, I'm not sure if it makes sense to talk about using typical/idiomatic Rust code when we talk about using only the standard library.
IMO the single biggest performance benefit of Rust over C++ is that you can very easily include libraries,
which allows you to get to a certain performance level faster, without spending time debugging UB, learning how to build a library or
understanding how to use it correctly. It also allows you to experiment with many different optimization approaches quickly by just swapping libraries.

Agreed, I will add this in slides and stress during the talk.

Slide 39 is very interesting. If I read it right, the biggest win was 6%?
That's pretty awesome for the smartness of LLVM and shows low cost of bound checks in these scenarios.
Would be interested in the performance of rustc itself while compiling, if bound checks are removed from it!

This part should be taken with grain of salt because this number is just a geomean of benchmarks within the project. So 1% could mean both that all benchmarks improved by 1% or that 10% of benchmarks improved by 10%. This is a rather crude metric but I'm not sure how else I can condense 100's of benchmarks into a single number :confused:

I planned to say about this when showing the first graph but now I see that I should add an explicit slides with explanation.

The unused write analysis is also very interesting. I tried to do something similar on the Rust (not LLVM) level some time ago;
similar statistics could be useful to detect useless initializations (but also useless Drops etc.) to optimize the compiler.

I was silly enough to implement my own plugin instead of reusing dead stores by Krister Walfridsson (Krister Walfridsson’s old blog: Watching for software inefficiencies with Valgrind) :slight_smile:

Re slide 109: I think it would be very interesting to see some numbers/benchmarks that would confirm that making signed integer overflow UB is actually important for perf.
"in practice" (i.e. not on SPEC2000, though it would be interesting to see the effect even there :slight_smile: ).
If I'm not mistaken, that's slide 112?

No, 112 is just about disabling default checks (divide by zero, saturated float casts, etc.).

I do have data for overflow UB (1-1.5% average improvements for rav1e and rustc-runtime-benchmarks, no changes in other projects) but for some reasons I didn't add the graph to slides (probably I should ?).

Though Disable mandatory overflow and other arithmetic checks. · yugr/rust-private@858aaa7 · GitHub seems "too weak",
surely that's not all the cases of checked arithmetics in the stdlib;
it would be better to modify rustc to tell LLVM to just treat all signed integer overflows as undefined, rather than to change stdlib code.

Yes, finding all those cases in stdlib was tedious. For some reasons I was afraid to do this globally in LLVM (it's been half a year now so hard to remember the exact reason).

In general, I love the idea of modifying the compiler to produce performance baselines,
though I worry that some of the modifications might not be going all the way or
showing what you wanted to demonstrate

I agree that this is possible and I'm open to criticism (I may not have enough time to fix all issues and recollect measurements before the talk but I'll at least mention them in slides).

it's quite difficult to compare between rustc and clang
they hold LLVM in quite different ways sometimes.

Could you give some specific examples ? This could be relevant for the talk.

The talk sounds very interesting and detailed, you clearly did your research!
It would be a great learning resource even as a website or a blog post.

Thank you ! My hope is that it will help people to write more performant Rust and also serve as basis for more complete or nuanced talks on performance or C/C++ comparison.

One additional nitpick: I probably don't understand the charts very well.
The differences in timings don't seem so large as to require a logarithmic Y axis

Oh, yes, will fix.

and I don't understand what is "% change". Does "bar goes up" mean it's slower or faster?

Yes, it's faster (we compare against baseline Rust).

I thought it's the former, but then on slide 146 you claim that almost everything is up to 5% slower when you disable all checks? That sounds weird

Hm, maybe you checked older version of pdf ? All benchmarks in slide 146 in latest (rust-slides/EN.pdf at 4e728f432ab91b4096fa755d6b1f1fd0bb2c5950 · yugr/rust-slides · GitHub) seem to improve (modulo some minor noise).

it would be better to modify rustc to tell LLVM to just treat all signed integer overflows as undefined, rather than to change stdlib code.

Yes, finding all those cases in stdlib was tedious. For some reasons I was afraid to do this globally in LLVM (it's been half a year now so hard to remember the exact reason).

Ok, now I remembered. Additive operations in LLVM are signless so e.g. it's not possible to discern signed add (which could get nsw) from unsigned one.