I'd like to understand how I can design my code so it can avoid having branch misprediction on a 50:50 branch.
The side value below is randomly Side::B or Side::S without any pattern.
pub enum Side {
B,
S,
}
....
match side {
Side::B => {}
Side::S => {}
}
You can do const generics. They (for now) only work with integers and bools but you can assign a number to each variant and then use that.
I think that should be the same as the c++ example but i don't know enough about templates to be sure.
Please explain, maybe in sudo code, for someone like me, old brain, 7th grader, how that could work on a 50:50 ratio. Does predict a sequence of left:right's and write multiple blocks of asm, then branch when it is wrong maybe two or three in?
Hmm, maybe I will watch the videos and that will help.
Maybe my question was a little bit vague. I mean there isn't any pattern on Side value in any given time, just random Side::B or Side::S . If I can bring this down to compile time, then it's possible to execute my code branchless as in the first video.
Maybe I asked my question vague too. I was thinking you understood a way to void having branch misprediction on a 50:50 branch using C++ and could explain.
I don't see how you could possibly optimize the testing of flipping a coin besides unwinding and compiling to a tree of branches to some depth that testing shows to be faster.
Am I right in understanding that you want some, potentially random, condition at compile time to select a particular function/operation which is then compiled permanently into the executable?
That is what I gather from reading the thumbnail of the video above: "If you know at compile time which function is to be executed...."
It looks to me like they’re looking for something like Clang’s __builtin_unpredictable to force branchless code in Rust, preferring cmove and so on, although it seems like that doesn’t even work reliably in Clang. Using branchless code would avoid the CPU ever deciding that Side::B should be predicted, making Side::S much slower, or vice versa.
As far as I know Rust doesn't support this at the moment; there’s #[cold] for functions that are unlikely to be called, or unreachable_unchecked for code that is never called, but nothing for LLVM’s unpredictable metadata, and no guarantees about what executable code will be generated by LLVM for any given Rust code.
Rather than branchless code, I see some suggestions to essentially use dynamic dispatch for this, which would use a Box<dyn Side> with SideB and SideS implementations. I guess the idea is that the CPU branch predictor won’t be clever enough to predict which implementation will be used, and will never get smart enough in the future. I wouldn’t be surprised if this ended up being slower than just mispredicting branches every now and then, though; the performance would need to be carefully measured to see if it was worth it.
The branchless_match() gets compiled to this arm64 code:
_branchless_match:
mov w8, #69 ; Load immediate value 69 into register w8
mov w9, #42 ; Load immediate value 42 into register w9
cmp w0, #0 ; Compare first parameter (bool) with 0
csel w8, w9, w8, ne ; Conditionally select between w9 and w8 based on the comparison
mul w0, w8, w1 ; Multiply the selected value with second parameter
ret ; Return the result
No control flow branches there. Just a conditional selection.
Of course I have no idea what the OP actually wants to do so this is just for fun.