Hi all,
What is the fastest way to compare:
2 unsigned integers e.g: u32
or 2 byte arrays of the same size for the same memory size as u32
: [u8;4]
Is the compiler smart enough to optimize this to the fastest way ?
Thanks for your help.
kpreid
March 25, 2025, 8:22pm
2
Doing this sort of thing right — producing the best sequence of machine instructions to perform a specific operation on some values — is essentially the optimizer's most basic responsibility.
2 Likes
There's one way to find out!
Looks like the optimizer is doing its job.
2 Likes
Great, thanks for the tip !
This specific case (a short array of primitives) is one of the places where rust itself makes sure it happens, before even getting to LLVM's optimizer:
master
← scottmcm:raw-eq
opened 06:45AM - 30 May 21 UTC
Example:
```rust
pub fn demo(x: [u16; 6], y: [u16; 6]) -> bool { x == y }
```…
Before:
```llvm
define zeroext i1 @_ZN10playground4demo17h48537f7eac23948fE(i96 %0, i96 %1) unnamed_addr #0 {
start:
%y = alloca [6 x i16], align 8
%x = alloca [6 x i16], align 8
%.0..sroa_cast = bitcast [6 x i16]* %x to i96*
store i96 %0, i96* %.0..sroa_cast, align 8
%.0..sroa_cast3 = bitcast [6 x i16]* %y to i96*
store i96 %1, i96* %.0..sroa_cast3, align 8
%_11.i.i.i = bitcast [6 x i16]* %x to i8*
%_14.i.i.i = bitcast [6 x i16]* %y to i8*
%bcmp.i.i.i = call i32 @bcmp(i8* nonnull dereferenceable(12) %_11.i.i.i, i8* nonnull dereferenceable(12) %_14.i.i.i, i64 12) #2, !alias.scope !2
%2 = icmp eq i32 %bcmp.i.i.i, 0
ret i1 %2
}
```
```x86
playground::demo: # @playground::demo
sub rsp, 32
mov qword ptr [rsp], rdi
mov dword ptr [rsp + 8], esi
mov qword ptr [rsp + 16], rdx
mov dword ptr [rsp + 24], ecx
xor rdi, rdx
xor esi, ecx
or rsi, rdi
sete al
add rsp, 32
ret
```
After:
```llvm
define zeroext i1 @_ZN4mini4demo17h7a8994aaa314c981E(i96 %0, i96 %1) unnamed_addr #0 {
start:
%2 = icmp eq i96 %0, %1
ret i1 %2
}
```
```x86
_ZN4mini4demo17h7a8994aaa314c981E:
xor rcx, r8
xor edx, r9d
or rdx, rcx
sete al
ret
```
2 Likes