Really strange output difference between 2 loops

Hi,
I was trying a useless piece of code to see the difference between the treatment of ranges, either inclusive or exclusive.
Godbolt test

(I lazily reused the default function names, "square").

The assembly difference looks really strange to me. The code does the exact same useless thing, but the assembly efficiency varies a lot.
I don't care if it's not measurable in real life, I focus on the output and don't understand why it's so different.

Is it a stabilization issue?

Thanks guys!

..= is known to be less efficient. It has to handle the case where the start and end bounds are equal which should return a single value. For .. this would just be an empty range, while for ..= there needs to be an additional bool field to distinguish between a range with a single element remaining and an exhausted range. It is not possible to just increment the end of the range by a single element and then handle it identical to .., because the increment can overflow.

Yes. But my feeling was, that for this case, the loop with inclusive range ..= generates more compact code. I would have expected the opposite, even when compact code does not always mean fast code. The generated code for the first exclusive test with .. is quite verbose.

I was going to say something similar, then decided I wasn't competent. It might be because square2 is called 2nd, so doesn't need to save registers [Edit: no that isn't it]. But trying to figure something like this out is rather difficult!

what do you mean by "stabilization" and why would it matter here ?

who said it would not be measurable ?
you should measure if you want to know if it is measurable or not

my banch for 100k items give me this

square_bench: 124.876ms ± 25.961
square2_bench: 156.820ms ± 41.331

that means the one with ..= is 25% slower, which is what we expected because ..= is slower in general, as @bjorn3 explained.

so the reason why the code is longer is to do more optimizations to go faster.

The first code looks like it speeds up the loop by handling i and i+1 at once; did not run gdb to confirm or invalidate it tho.

I think you mean the case where the upper bound is equal to the largest possible value (regardless of the lower bound), which needs to be a special case for ..=, but works like any other for ...