I'm working on a project, I need to find the minimum fractional period of a string, I have two naive algorithms that should have similar speed but in my tests the second algorithm, which uses slice comparison, is almost twice as fast. These are the two algorithms:
pub fn period_naive1(s: &[u8]) -> usize {
let n = s.len();
'outer: for i in 1..n {
for j in 0..n - i {
unsafe {
if s.get_unchecked(j) != s.get_unchecked(j + i) {
continue 'outer;
}
}
}
return i;
}
n
}
pub fn period_naive2(s: &[u8]) -> usize {
let n = s.len();
for i in 1..n {
if s[..n - i] == s[i..] {
return i;
}
}
n
}
The main difference is that the second algorithm to compare the slices uses iterators, I searched the net for what could be the cause and they seemed to be the boundary checks on the slices, but there doesn't seem to be any improvement.
Why do the two algorithms differ so much in speed?
Rustc is very good at optimizing idiomatic code. As to why exactly the second is faster, to me it looks like the slice comparison in the second is far less expensive (there's no inner loop in the assembly at all AFAICT), but I'm not good at reading assembly. Here's the godbolt of your snippet, if you care to explore yourself.
As a meta-answer, LLVM is generally not as good at optimizing loops that have multiple exit conditions -- like that one with a continue in it.
So the first algorithm ends up with a very basic byte-per-iteration inner loop, while the second dispatches to a function that's a heavily optimized implementation.
For example, compare these two functions: https://rust.godbolt.org/z/6En5bY4Gh. The former ends up in a byte-at-a-time loop, whereas the latter is rather more complicated but does a 16-bytes-at-once loop.