Coding a Gameboy emulator for fun and profit learning. I've gotten pretty far but there is one thing bothering me. On opcode processing I'm dispatching on the value of the opcode to a function that returns a Boxed implementation of the opcode that eventually gets run.
/// Represents an instruction that can be executed by the Gameboy.
/// The instruction is a function that takes a mutable reference to a Gameboy and returns
/// the number of t-states (system clock ticks) that the instruction took to execute.
pub type Instruction = dyn Fn(&mut gameboy::Gameboy) -> u8;
pub fn from_byte(byte: u8) -> Box<Instruction> {
match byte {
0x00 => Box::new(pai(nop::nop)),
0x10 => Box::new(pai(misc::stop)),
0x02 => Box::new(pai(ld::ld_mem_at_r16_r(
Register16bTarget::BC,
RegisterTarget::A,
))),
What's bothering me is that from_byte is dynamic and is instantiating memory every time it is called which generates a lot memory pressure. Well, TBH the hardware is powerful enough to not care but for the sake of learning I would like to make this thing static. Tried a couple of approaches but had no luck. Please help. The whole thing is open source of course: GitHub - pcasaretto/rust-game-boy-emulator (edited)
I'm not sure because I haven't looked at all the closures you're creating for the instructions, but it looks like the closures are not capturing any variables from their environment. In other words, they take a &Gameboy parameter and return u8, without using any other variables. If that's true, you can just use fn instead of Fn and then no boxing is needed.
Oh, nevermind, I see some of the closures reference other parameters such as a RegisterTarget. So you can't simply use fn. You probably knew that.
Assuming that from_byte has no side effects that you care about, you can cache the result instead of reallocating every time, which will avoid the memory pressure but still dynamically dispatch:
use std::sync::OnceLock;
pub type Instruction = dyn Send+Sync+Fn(&mut gameboy::Gameboy) -> u8;
pub fn from_byte(byte: u8) -> &'static Instruction {
const EMPTY: OnceLock<Box<Instruction>> = OnceLock::new();
static CACHE: [OnceLock<Box<Instruction>>;256] = [EMPTY; 256];
CACHE[byte as usize].get_or_init(|| _from_byte(byte))
}
fn _from_byte(byte: u8) -> Box<Instruction> {
// Your current implementation here...
unimplemented!()
}
A lazy init! Great idea.
And I think that your example using Send+Sync was the missing piece for me to use the lazy_static crate. What do those mean btw?
Thank you! That's clearer now. We match on self and then just call the appropriate function.
From a memory perspective, it doesn't seem better as we would allocate an Instruction at every step. Right?
pai (horrible name) means "PC advancing instruction". Most instructions would advance the PC by one so I refactored that common code into a decorating function that calls the original and advances the PC.
So misc::stop would not return a closure, it would perform the instruction. And pai would just advance the PC.
Or is there significant work done currently when creating the closure (other than the boxing) that would have to be repeated on every call with this approach?
I'm not sure if "allocate" is the right term here - You're putting a small (probably 1-4 bytes) enum value on the stack and immediately using it. I haven't seen the entire Instruction definition, but if there's no Box, there's no heap allocation or indirection.
I wouldn't be surprised if the optimiser inlined your Instruction::run() method and the value is entirely optimised out at runtime. Even if it can't be optimised away, the value is tiny so it'll almost certainly only ever live in a register, making it essentially free anyway.
They mean that the closure has to be thread-safe. This is required for values stored in static variables because there are no controls in place to force or verify that they’re only accessed from the main thread.
I see! Thank you, Michael.
There is one small thing in the way of this approach, some Instructions are parameterized and return a closure that holds onto this parameter.
It occurred to me that I can probably use a macro to generate the functions at compile time.
Would that make sense?