I was hoping someone maybe had some thoughts on how to micro optimize this following snippet. I'm working on a little programming language / vm and have one benchmark that shows reading the next bytecode instruction can account for up to 40% of the runtime using cargo flamegraph. Here is the stripped down example
struct Vm (*const u8);
pub fn main() {
// stand in for bytecode
let v: Vec<u8> = vec![1, 2, 3, 4, 5, 6, 7];
// the instruction pointer
let ip: *const u8 = &v[0];
// vm typically holds a bunch of other stuff
let mut vm = Vm(ip);
// stand in for execution loop
loop {
let instruction = vm.read_byte();
if instruction == 4 {
break
}
}
}
impl Vm {
/// in essence just read the current byte and move the pointer up
/// one element and return the byte value
#[inline]
fn read_byte(&mut self) -> u8 {
let byte = unsafe { *self.0 };
self.update_ip(1);
byte
}
/// just bump the pointer 1 byte
#[inline]
pub fn update_ip(&mut self, offset: isize) {
unsafe { self.0 = self.0.offset(offset) };
}
}
So in this example this could be replaced an iterator but in the real code we'll call update_ip
with values other than 1 such as -15
or 23
when we encounter jump instruction. I'm completely willing to accept that this is as fast as this is going get, but I don't know x86 assembly at all to know if the generated code could somehow be improved.
Some thoughts I had were to get rid of these functions and just write macros to ensure they are inlined at the callsite. I also thought to potentially add another update_ip
that was specialized to explicitly be self.0.offset(1)
. Any suggestions would be greatly appreciated
So for further clarification here we can assume the bytecode is correct as this functionality isn't exposed externally.
For real implementations see the follow on github
update_ip
: https://github.com/Laythe-lang/Laythe/blob/master/laythe_vm/src/vm.rs#L589-L591
read_byte
: https://github.com/Laythe-lang/Laythe/blob/master/laythe_vm/src/vm.rs#L619-623
main execution loop: https://github.com/Laythe-lang/Laythe/blob/master/laythe_vm/src/vm.rs#L474-573