Looking for help understanding Rust's performance vs C++

j-haj · July 19, 2019, 3:11pm

This is a perfect real-world example of what I am trying to better understand about Rust. Rust is great for many things, but not necessarily great for everything. Thanks for sharing this!

matklad · July 19, 2019, 3:14pm

Note that, by themselves, bounds checks are almost free, as they are trivially predicted never taken branches. What is slow is missing auto vectorization opportunities. See this lecture , slides 6-18 (video, in case auto-generated subtitles are of any help). The TL;DR is that for indirect summation there's zero slow-down from bounds checks, for direct summation there's 2x slow down due to missed vectorization.

j-haj · July 19, 2019, 3:26pm

Thanks a lot for sharing this! Are there any references you used for creating these slides that you can share? This looks like some great content.

juleskers · July 19, 2019, 3:34pm

Well, if that kind of story interests you, maybe the resources I found while we were evaluating will be of interest to you too
They're not about numerical computing, but since they all deal with translating fast, idiomatic C code into fast, idiomatic Rust code, they provide comparison material in what 'works' in each language.

Carols10cents has a great talk about porting the Zopfli decompressor from C to Rust.
She recorded perf all along the process of sneaking in that first extern "C" Rust function, all the way to the 100% Rusty version.
- slides: GitHub - carols10cents/rust-out-your-c-talk: Slides and resources for my talk "Rust out your C", first given at Pittsburgh Tech Fest June 11, 2016.
- repo of result: GitHub - carols10cents/zopfli: A Rust implementation of the Zopfli compression algorithm.
MiniZ port from C to Rust GitHub - Frommi/miniz_oxide: Rust replacement for miniz
Federico's Blog (co-founder of Gnome)
- oxidising librsvg2, without any of the dependers noticing:
  https://people.gnome.org/~federico/blog/tag/librsvg.html
- he recently took over maintainership of bzip2, and is beginning to experiment with oxidations there too:
  https://people.gnome.org/~federico/blog/tag/bzip2.html

matklad · July 19, 2019, 3:39pm

The slides are on GitHub: GitHub - matklad/rust-course. The content is mostly "stuff that I just know", with primary reference being the sources of Rust standard library. I can also recommend Programming Rust book

krdln · July 19, 2019, 4:09pm

I think this is quite unfair comparison as basically compares bound-checked indexing vs "trust-the-programmer" indexing. It just happens that Rust defaults to bound-checked and C++ defaults to not. You'd get reversed results when comparing C++'s .at() and Rust's get_unchecked().

So the interesting question could be – is C/Fortran style numerical computing (which happens to do lot of indexing) compatible with Rust safety guarantees?

I'd say there are three cases here:

Indexing-heavy algorithm is unavoidable to be performant and can't be rewritten into more idiomatic Rust (eg. using iterators). So to uphold safety guarantees you need to either:
- Convince the compiler to eliminate the bound checks (with tricks like assert or creating pre-checked slices described in this thread)
- Use unsafe and prove to yourself your accesses are within bounds (note that this is a default mode in C++!)
Indexing-heavy algorithm translates nicely to Rust style, eg. iterators. Both safety and perf win!
Indexing-heavy algorithm is actually suboptimal (because CPUs don't really like arbitrary indexing) and tweaking it to better match Rust's style will actually benefit performance.
- For example, (assuming big matrices) matrix multiplication could be changed to the following algorithm:
  1. Create transposed variant of matrix B (this is O(n²), so it's not as perf-sensitive)
  2. Now you can perform matrix multiplication with single dot-product expressed as .iter() + .zip() + .map() + .sum(), which vectorizes nicely.
- Naive implementation of Smith-Waterman or Edit distance (or any similar dynamic algorithm) uses array indexing, but if one change order of operations from top-down to diagonal, the main loop can be expressed with iterator operations and vectorized, thus beating naive indexed version even with no bound checks.

I think the conclusion could be that either you don't care for the last drop of performance and can handle the cost (or missed optimization opportunities) of bound-checks or you do care about last drop of performance and can spend some time to get that perf (be it by tweaking algorithm, making safety proofs for llvm/yourself or just searching for library that already did it).

pickfire · August 3, 2019, 3:45pm

let len = vec.len();
let slice = &vec[0..len];
for i in 0..len {
    slice[i] // no bounds check
}

I remember some blog posts did mention that using assertion does remove the bound checks as well. Maybe the following works?

debug_assert!(x.len() == y.len());
for i in 0..x.len() {
    ...
}

kornel · August 3, 2019, 11:43pm

debug_assert! doesn't, because it will be removed in release mode (when optimizations are enabled). assert! might. Unfortunately, it all depends on the optimizer, so it's best to double-check with https://rust.godbolt.org

Topic		Replies	Views
Optimizing linear algebra code help	47	3107	March 20, 2020
Performance of array access vs C	31	5841	June 1, 2020
Clippy driving me to insanity, insisting on iterators	41	5285	January 14, 2020
Why are cartesian iterators slower than nested fors?	18	1999	May 22, 2020
Performance issue with C-array like computation (2 times worst than naive java) help	47	6950	March 14, 2017

Looking for help understanding Rust's performance vs C++

Related topics