Question about optimizer and iterators

let my_vec = Vec<i32> = /* ... */;

let len = my_vec.len();
let count = my_vec.iter().count();

let len_minus_1 = my_vec.len() -1;
let count_skip = my_vec.iter().skip(1).count();
let count_minus_1 = my_vec.iter().count() -1;

len and count will always be the same value. Likewise for len_minus_1, count_skip and count_minus_1.

In term of performance would those operation be optimized to the same assembly? In theory, they can, but is rustc + llvm smart enough today? Can I rely on this assumption?

More specifically, my input is always going to be a Vec (so the iterator will be ExactSizeIterator), but the pipeline of iterator can be cut into multiple function in different module of the same crate. The pipeline doesn't contains any side-effect. The data contained in the Vec can be bigger object than i32.

For example, would the following code still be completely optimized away?

mod A {
    pub struct MyVec(Vec<i32>);
    impl MyVec {
        // The public API just mention Iterator, not ExactSizeIterator
        pub fn iter(&self) -> impl Iterator<Item=i32> {
            self.0.iter()
        }
    }
}

mod B {
    trait GetSize {
        fn size() -> usize;
    }
    impl GetSize for MyVec {
        fn size_minus_one() -> usize {
            // I can't access to `self.0.len() - 1` directly
            // so I'm using an indirect way
            self.iter().skip(1).count()
       }
}

I could check for this specific example in compiler explorer, but I would to know if those optimization can be relied on. I don't want my code to become stupidly slow because the optimizer missed optimizations here because the architecture I'm using was too complicated for it.

Sometimes the optimizer is fantastically smart, but these are not examples of that. Vec::iter() returns a slice::Iter, and its implementation of Iterator includes an override of count() that simply returns the length. Likewise, Skip overrides count() by using nth() to skip items, then just taking the inner count(), which is again just the length for slices.

Is there some kind of methodology to know when you can expect the optimizer to be good? For example, I know that iterators are transparent to the optimizer compared to the same operations using a for loop. But I know it from experience and reading blog posts. I don't know how to test that validate that kind of hypothesis myself.

1 Like

I think it's just experience you gain over time, because there's never any guarantee that the optimizer will do what you want. I would focus first on writing code that's easy to understand, and over time you'll build an intuition for what is both understandable and performant. In the rare case, you may need to sacrifice readability for performance, but this should be carefully justified by profiling.

3 Likes

Yes, measure the performance of you code.

That does not sound very satisfactory but it's true of C and C++ as much as Rust. No doubt other languages as well. Modern day optimizers perform amazing feats but it can be hard to know if the code you have written defeats them or helps them.

I have been struggling with this recently in attempting to get Rust to perform as well as equivalent C functions. Many people made suggestions here as to how to write optimal Rust code. Very few of them were right :(. The way I found out was to use cargo bench and other techniques to actually measure how they performed.

You might like to see how this was done:

2 Likes

I could not agree more. I would prioritize code clarity over performance unless it was found there was actually a performance problem.

Oddly enough for both the examples I linked above the code that turned out to be the most performant was also the clearest to read :slight_smile:

1 Like

That should be "UNTIL it is found that there actually IS a performance problem." As Hoare wrote and Knuth popularized decades ago, and which is still true today, "Premature optimization is the root of all evil."

Significant optimizations often require reconceptualization of the algorithm and/or the data representation. In general, micro-optimization is not warranted until measurement shows that an area is a "hot path" that necessitates improvement.

Err... you are giving me headache. Is that not equivalent to what I wrote?

I'm tempted to revise it to:

Prioritize code clarity over performance unless it is found there is actually a performance problem.

But, meh.

2 Likes

My question was definitively not about premature optimization (think “quicksort versus merge sort”), but premature pessimisation (“Is my bubble sort going to be optimized away, or should I change the architecture a bit to be able to use std::sort directly?”). My current architecture, if not optmized away is O(n), while a better one whould be O(1) with a small constant.

The issue here is that it's really fragile. If my current code is considered "hard" for the optimizer, this means that I can change anything, anywhere, and suddenly the optimizer will not be able to optimize it away. It's not a question about pure performance, but more about “is this architecture ok”. And thanks for the links.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.