I'm trying to optimize a function which searches a slice for certain bytes. I've found that processing the input in batches improves performance. As the inputs are very large, hinting to the compiler that running out of bytes is a very rare occurrence also gives a speed bump.
However, the ergonomics are pretty terrible. It currently looks something like this:
const BATCH_SIZE: usize = 32;
fn search(bytes: &[u8]) -> Option<usize> {
let mut index = 0;
loop {
if bytes.len() - index >= BATCH_SIZE {
// Process batch. Compiler unrolls loop.
for _i in 0..BATCH_SIZE {
let byte = unsafe { *bytes.get_unchecked(index) };
if matches(byte) {
// **Do some stuff**
return Some(index);
}
index += 1;
}
// Not found yet, go round again
} else {
// Not enough bytes for a batch. Go byte by byte.
#[cold]
fn unbatched(bytes: &[u8], mut index: usize) -> Option<usize> {
while index < bytes.len() {
let byte = bytes[index];
if matches(byte) {
// **Do some stuff**
return Some(index);
}
index += 1;
}
// EOF
None
}
return unbatched(bytes, index);
}
}
}
Please ignore the obvious solution to use a chunked iterator! That's not the subject of my question.
The problem here is that **Do some stuff** is a big chunk of complex logic, and it has to be duplicated twice. It'd be much nicer to replace #[cold] with something like this instead:
fn search(bytes: &[u8]) -> Option<usize> {
let mut index = 0;
'outer: loop {
// `likely` here performs same role as `#[cold]` in previous version
if likely(bytes.len() - index >= BATCH_SIZE) {
// Process batch. Compiler unrolls loop.
for _i in 0..BATCH_SIZE {
let byte = unsafe { *bytes.get_unchecked(index) };
if matches(byte) {
break 'outer;
}
index += 1;
}
// Not found yet, go round again
} else {
// Not enough bytes for a batch. Go byte by byte.
while index < bytes.len() {
let byte = bytes[index];
if matches(byte) {
break 'outer;
}
index += 1;
}
// EOF
return None;
}
}
// Found a match.
// **Do some stuff** <- This only appears in one place now
Some(index)
}
But is there anything resembling this likely() function which can replace #[cold]?
std::intrinsics::likely is unstable, and also apparently doesn't work anymore.
hashbrown's old method I've read also no longer works.
Ditto likely_stable crate - using it produced no difference in performance.
Is #[cold] on a function the only workable option for hinting about branch ordering/branch prediction?