This is a perfect real-world example of what I am trying to better understand about Rust. Rust is great for many things, but not necessarily great for everything. Thanks for sharing this!
Note that, by themselves, bounds checks are almost free, as they are trivially predicted never taken branches. What is slow is missing auto vectorization opportunities. See this lecture , slides 6-18 (video, in case auto-generated subtitles are of any help). The TL;DR is that for indirect summation there's zero slow-down from bounds checks, for direct summation there's 2x slow down due to missed vectorization.
Thanks a lot for sharing this! Are there any references you used for creating these slides that you can share? This looks like some great content.
Well, if that kind of story interests you, maybe the resources I found while we were evaluating will be of interest to you too
They're not about numerical computing, but since they all deal with translating fast, idiomatic C code into fast, idiomatic Rust code, they provide comparison material in what 'works' in each language.
- Carols10cents has a great talk about porting the Zopfli decompressor from C to Rust.
She recorded perf all along the process of sneaking in that firstextern "C"
Rust function, all the way to the 100% Rusty version. - MiniZ port from C to Rust GitHub - Frommi/miniz_oxide: Rust replacement for miniz
- Federico's Blog (co-founder of Gnome)
- oxidising
librsvg2
, without any of the dependers noticing:
https://people.gnome.org/~federico/blog/tag/librsvg.html - he recently took over maintainership of bzip2, and is beginning to experiment with oxidations there too:
https://people.gnome.org/~federico/blog/tag/bzip2.html
- oxidising
The slides are on GitHub: GitHub - matklad/rust-course. The content is mostly "stuff that I just know", with primary reference being the sources of Rust standard library. I can also recommend Programming Rust book
I think this is quite unfair comparison as basically compares bound-checked indexing vs "trust-the-programmer" indexing. It just happens that Rust defaults to bound-checked and C++ defaults to not. You'd get reversed results when comparing C++'s .at()
and Rust's get_unchecked()
.
So the interesting question could be ā is C/Fortran style numerical computing (which happens to do lot of indexing) compatible with Rust safety guarantees?
I'd say there are three cases here:
- Indexing-heavy algorithm is unavoidable to be performant and can't be rewritten into more idiomatic Rust (eg. using iterators). So to uphold safety guarantees you need to either:
- Convince the compiler to eliminate the bound checks (with tricks like
assert
or creating pre-checked slices described in this thread) - Use
unsafe
and prove to yourself your accesses are within bounds (note that this is a default mode in C++!)
- Convince the compiler to eliminate the bound checks (with tricks like
- Indexing-heavy algorithm translates nicely to Rust style, eg. iterators. Both safety and perf win!
- Indexing-heavy algorithm is actually suboptimal (because CPUs don't really like arbitrary indexing) and tweaking it to better match Rust's style will actually benefit performance.
- For example, (assuming big matrices) matrix multiplication could be changed to the following algorithm:
- Create transposed variant of matrix B (this is O(n²), so it's not as perf-sensitive)
- Now you can perform matrix multiplication with single dot-product expressed as
.iter()
+.zip()
+.map()
+.sum()
, which vectorizes nicely.
- Naive implementation of Smith-Waterman or Edit distance (or any similar dynamic algorithm) uses array indexing, but if one change order of operations from top-down to diagonal, the main loop can be expressed with iterator operations and vectorized, thus beating naive indexed version even with no bound checks.
- For example, (assuming big matrices) matrix multiplication could be changed to the following algorithm:
I think the conclusion could be that either you don't care for the last drop of performance and can handle the cost (or missed optimization opportunities) of bound-checks or you do care about last drop of performance and can spend some time to get that perf (be it by tweaking algorithm, making safety proofs for llvm/yourself or just searching for library that already did it).
let len = vec.len();
let slice = &vec[0..len];
for i in 0..len {
slice[i] // no bounds check
}
I remember some blog posts did mention that using assertion does remove the bound checks as well. Maybe the following works?
debug_assert!(x.len() == y.len());
for i in 0..x.len() {
...
}
debug_assert!
doesn't, because it will be removed in release mode (when optimizations are enabled). assert!
might. Unfortunately, it all depends on the optimizer, so it's best to double-check with https://rust.godbolt.org
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.