No. This depends much on what the loop is doing, or more directly stated, what the LLVM optimizer is doing to the loop. If all goes to plan (and as you see without .step_by()
it does for simple loops), the for-loop-machinery can lead to better code.
In particular, looking at the generated assembly for for_loop
cmpq %fs:112, %rsp
ja .LBB0_2
movabsq $8, %r10
movabsq $0, %r11
callq __morestack
pushq %rax
movl $1, (%rsp)
leaq (%rsp), %rax
movl (%rsp), %eax
testl %eax, %eax
js .LBB0_9
movabsq $4294967296000000, %rcx
movabsq $-4294967296, %r8
leaq 4(%rsp), %rsi
jmp .LBB0_4
movl $1, 4(%rsp)
movq %rcx, %rdi
shrq $32, %rdi
cmpl %edi, %ecx
jge .LBB0_9
movl %ecx, %edx
addl %eax, %edx
jno .LBB0_7
andq %r8, %rcx
orq %rdi, %rcx
jmp .LBB0_8
movl %edx, %edx
andq %r8, %rcx
orq %rdx, %rcx
jmp .LBB0_8
popq %rax
We can see that the loop was partially unrolled, whereas the while loop was not:
cmpq %fs:112, %rsp
ja .LBB1_2
movabsq $4, %r10
movabsq $0, %r11
callq __morestack
subq $4, %rsp
movl $1000000, %eax
leaq (%rsp), %rcx
movl $1, (%rsp)
decl %eax
jne .LBB1_3
addq $4, %rsp
Note also that this example could have been more succinctly written as (0..1_000_000).map(test::black_box).sum()
, which incidentially has about the same performance as the for loop (as it should – it generates the same assembly).