Is `#[cold]` the only reliable way to hint to branch predictor?

I'm trying to optimize a function which searches a slice for certain bytes. I've found that processing the input in batches improves performance. As the inputs are very large, hinting to the compiler that running out of bytes is a very rare occurrence also gives a speed bump.

However, the ergonomics are pretty terrible. It currently looks something like this:

const BATCH_SIZE: usize = 32;

fn search(bytes: &[u8]) -> Option<usize> {
  let mut index = 0;
  loop {
    if bytes.len() - index >= BATCH_SIZE {
      // Process batch. Compiler unrolls loop.
      for _i in 0..BATCH_SIZE {
        let byte = unsafe { *bytes.get_unchecked(index) };
        if matches(byte) {
          // **Do some stuff**
          return Some(index);
        }
        index += 1;
      }
      // Not found yet, go round again
    } else {
      // Not enough bytes for a batch. Go byte by byte.
      #[cold]
      fn unbatched(bytes: &[u8], mut index: usize) -> Option<usize> {
        while index < bytes.len() {
          let byte = bytes[index];
          if matches(byte) {
            // **Do some stuff**
            return Some(index);
          }
          index += 1;
        }
        // EOF
        None
      }
      return unbatched(bytes, index);
    }
  }
}

Please ignore the obvious solution to use a chunked iterator! That's not the subject of my question.

The problem here is that **Do some stuff** is a big chunk of complex logic, and it has to be duplicated twice. It'd be much nicer to replace #[cold] with something like this instead:

fn search(bytes: &[u8]) -> Option<usize> {
  let mut index = 0;
  'outer: loop {
    // `likely` here performs same role as `#[cold]` in previous version
    if likely(bytes.len() - index >= BATCH_SIZE) {
      // Process batch. Compiler unrolls loop.
      for _i in 0..BATCH_SIZE {
        let byte = unsafe { *bytes.get_unchecked(index) };
        if matches(byte) {
          break 'outer;
        }
        index += 1;
      }
      // Not found yet, go round again
    } else {
      // Not enough bytes for a batch. Go byte by byte.
      while index < bytes.len() {
        let byte = bytes[index];
        if matches(byte) {
          break 'outer;
        }
        index += 1;
      }
      // EOF
      return None;
    }
  }

  // Found a match.
  // **Do some stuff** <- This only appears in one place now
  Some(index)
}

But is there anything resembling this likely() function which can replace #[cold]?

std::intrinsics::likely is unstable, and also apparently doesn't work anymore.

hashbrown's old method I've read also no longer works.

Ditto likely_stable crate - using it produced no difference in performance.

Is #[cold] on a function the only workable option for hinting about branch ordering/branch prediction?

1 Like

Pretty much, #[inline(never)] can have a similar effect but it has all the same ergonomics issues and isn't semantically quite the right thing anyway.
#[cold] can also be put on closures, but that requires nightly. But that can be worked around by having a cold trampoline function that executes a closure. So that might be a bit more ergonomic?

120370 aims to fix likely/unlikely.

1 Like

Obligatory mention of Profile-guided Optimization - The rustc book, which is the most accurate way to hint the branch predictor.

4 Likes

@the8472 Thanks loads! The cold trampoline function doesn't solve the problem completely (still can't use break to exit the loop inside a closure) but still it's much better having access to external vars, rather than having to pass them in. And it's less verbose.

@scottmcm I am interested in Profile-guided Optimization, but I believe that'd only work for applications, whereas this is a library.

Tangential: does anyone know the reason why Rust has #[cold], but doesn't have #[hot], while C has both of them?

Also note that PGO doesn't cover all cases. Besides being mostly unusable for libraries, there may be reasons for explicit #[hot] and #[cold] directives even if PGO would hint otherwise. For example, I may have a rarely used but latency-critical path, which I'd like to prioritize at the expense of possibly much more common, but also latency-irrelevant different branch. Unfortunately, my understanding is that #[hot] and #[cold] don't really cover that case either, since they are hints rather than mandatory requirements, and they are entirely ignored by the compiler when PGO is active.

Probably at least in part because if you go back to older LLVM https://releases.llvm.org/10.0.0/docs/LangRef.html#function-attributes, it has cold as a function attribute but not hot.

My reading of the current LLVM docs is that hot is only useful if you're running PGO and need to override it, so since Rust 1.0 didn't have PGO support (AFAIK), it had no need for it, and LLVM didn't support it back then anyway.

And #[cold] is really important for compiling safety checks, to get the hopefully-not-hit stuff out of the normal icache zone. I don't think #[hot] is nearly as pervasively important.

3 Likes