Most of what I do with Rust at the moment happens on a slightly weird embedded platform (the Game Boy Advance). I noticed something about next_power_of_two
.
fn next_u32(input: u32) -> u32 {
input.next_power_of_two()
}
example::next_u32:
push {r4, r6, r7, lr}
add r7, sp, #8
mov r4, r0
sub r0, #1
bl __clzsi2
cmp r4, #1
bhi .LBB0_2
mov r0, #1
pop {r4, r6, r7}
pop {r1}
bx r1
.LBB0_2:
mov r1, #0
mvn r1, r1
lsr r1, r0
add r0, r1, #1
pop {r4, r6, r7}
pop {r1}
bx r1
A subroutine call to __clzsi2
and a branch. Probably on a machine with a hardware instruction for counting leading zeroes this would be nice and fast, but CLZ didn't show up until ARMv5. Compare that to this algorithm from Bit Twiddling Hacks.
pub fn next_u32(mut input: u32) -> u32 {
input -= 1;
input |= input >> 1;
input |= input >> 2;
input |= input >> 4;
input |= input >> 8;
input |= input >> 16;
input + 1
}
example::next_u32:
sub r0, #1
lsr r1, r0, #1
orr r1, r0
lsr r0, r1, #2
orr r0, r1
lsr r1, r0, #4
orr r1, r0
lsr r0, r1, #8
orr r0, r1
lsr r1, r0, #16
orr r1, r0
add r0, r1, #1
bx lr
Every one of these instructions takes one cycle to execute because it operates entirely on data stored in registers. My initial instinct was to take this to the Rust issue tracker, but I'd like confirmation that that's a good idea first in case there's considerations I'm not thinking/aware of.