Out of curiosity rather than any actual perf issue I'm looking at the code generated by matching against two functionally equivalent ways of using enums.
Specifically this nested version
#[derive(Clone, Copy)]
pub enum InputType {
Mouse,
Keyboard
}
#[derive(Clone, Copy)]
pub enum ButtonState {
Usual,
Hover(InputType),
Pressed(InputType),
}
... and this version that separates out the variants:
#[derive(Clone, Copy)]
pub enum ButtonStateSep {
Usual,
HoverM,
HoverK,
PressedM,
PressedK,
}
As a simplified version of what I want to happen in my actual program, I'm returning u8
s instead of more complicated things.
pub fn input(s: InputType) -> u8 {
use InputType::*;
match s {
Mouse => 7,
Keyboard => 3,
}
}
These three functionally equivalent functions each generate different machine code:
pub fn sep(s: ButtonStateSep) -> (u8, u8) {
use ButtonStateSep::*;
match s {
Usual => (0, 0),
HoverM => (1, input(InputType ::Mouse)),
HoverK => (1, input(InputType::Keyboard)),
PressedM => (5, input(InputType::Mouse)),
PressedK => (5, input(InputType::Keyboard)),
}
}
pub fn nested(s: ButtonState) -> (u8, u8) {
use ButtonState::*;
match s {
Usual => (0, 0),
Hover(i) => (1, input(i)),
Pressed(i) => (5, input(i)),
}
}
macro_rules! input {
($input_type: expr) => {
match $input_type {
Mouse => 7,
Keyboard => 3,
}
}
}
pub fn nested_inlined(s: ButtonState) -> (u8, u8) {
use ButtonState::*;
match s {
Usual => (0, 0),
Hover(i) => (1, input!(i)),
Pressed(i) => (5, input!(i)),
}
}
sep
generates a short sequence of instructions involving only copies and shifts, whereas nested
generates longer code which contains two branches. Finally nested_inlined
generates a set of instructions which are similar but not identical to those generated by sep
.
I don't really care about the difference between the machine instructions for sep
and nested
per se, but I'm curious why the compiler didn't just do the same inlining I did, since even without converting to the bit-shifting version that would still be less code than what was actually produced, AFAICT.