Update in March 2017
Rust now implements .zip()
specially for a couple of core iterators, including slice iterators (iter
and iter_mut
) as well as a few adaptors of those (map
, cloned
, zip
, maybe more). For these iterators zip
will now provide satisfactory loop optimization, for example allowing for auto-vectorization of certain loops.
The kind of exploration done below the line is still relevant for special cases but would need to be updated for the current state of Rust. Happy hacking -bluss
Me and @dotdash have been looking into the performance of different ways to “zip” iterators. Like you probably know, the preferred way to iterate two slices or vectors, should be like this:
fn iterate_slices_with_zip<T>(xs: &[T], ys: &[T]) {
for (x, y) in xs.iter().zip(ys) {
/* loop body here */
}
}
.zip()
ignores elements past the end of the shortest of the two inputs.
However, the Zip
iterator adaptor does not optimize as well as we’d like. It needs to check first iterator if it has reached the end, then check the second iterator if it has reached the end, and then it can yield the next element. What we need is that the optimizer realizes when these double checks are redundant.
I don’t know if we are going to be able to resolve it in geneneral implementation. With specialization, if available in the future, we will at least be able to address it for the common case of iterating two slices in parallel.
If we “brute force” the problem it looks like this: use unsafe!
fn iterate_slices_counted_unchecked<T>(xs: &[T], ys: &[T]) {
let len = cmp::min(xs.len(), ys.len());
for i in 0..len {
unsafe {
let x = &*xs.get_unchecked(i);
let y = &*ys.get_unchecked(i);
}
}
}
The unchecked loop does indeed turn into what looks like a loop without redundant overhead.
What happens if we try to not use unsafe, shouldn't the optimizer still be able to elide the bounds checks?
fn iterate_slices_counted<T>(xs: &[T], ys: &[T]) {
let len = cmp::min(xs.len(), ys.len());
for i in 0..len {
let x = &xs[i];
let y = &ys[i];
}
}
It turns out it doesn't! For some reason, the information that i < xs.len() && i < ys.len()
does not reach the loop body.
We found a workaround! Be abundantly clear:
fn iterate_slices_counted_2<T>(xs: &[T], ys: &[T]) {
let len = cmp::min(xs.len(), ys.len());
let xs = &xs[..len];
let ys = &ys[..len];
for i in 0..len {
let x = &xs[i];
let y = &ys[i];
}
}
Now it works! iterate_slices_counted_2
elides the bounds checks and does perform just as well as iterate_slices_counted_unchecked
. All it took was saying it just the right way.
Sigh, optimizing compilers are mercurial beasts..
-
A few developers are looking into our llvm optimization passes, which may have a big impact and make it infer rust idioms a bit better, and elide for example bounds checks in more places.
-
.zip()
will hopefully improve, but the outlook for that is unclear. -
I published a safe wrapper around
iterate_slices_counted_unchecked
, it is calledZipSlices
in itertools. If you have critical lock step loops, you might consider profiling and adoptingZipSlices
or just the the solutions mentioned here if needed. -
(Before the counted version of
ZipSlices
, I had a version with three pointers: start of slice 1, end of slice 1, start of slice 2. Llvm did not appreciate this as well as it did the counted loop!)
Performance evaluations are as of rustc 1.3.0-nightly (20f421cd5 2015-07-06)