Dispatching statically to dynamic functions

Coding a Gameboy emulator for fun and profit learning. I've gotten pretty far but there is one thing bothering me. On opcode processing I'm dispatching on the value of the opcode to a function that returns a Boxed implementation of the opcode that eventually gets run.

/// Represents an instruction that can be executed by the Gameboy.
/// The instruction is a function that takes a mutable reference to a Gameboy and returns
/// the number of t-states (system clock ticks) that the instruction took to execute.
pub type Instruction = dyn Fn(&mut gameboy::Gameboy) -> u8;

pub fn from_byte(byte: u8) -> Box<Instruction> {
    match byte {
        0x00 => Box::new(pai(nop::nop)),
        0x10 => Box::new(pai(misc::stop)),

        0x02 => Box::new(pai(ld::ld_mem_at_r16_r(
            Register16bTarget::BC,
            RegisterTarget::A,
        ))),

What's bothering me is that from_byte is dynamic and is instantiating memory every time it is called which generates a lot memory pressure. Well, TBH the hardware is powerful enough to not care but for the sake of learning I would like to make this thing static. Tried a couple of approaches but had no luck. Please help. The whole thing is open source of course: GitHub - pcasaretto/rust-game-boy-emulator (edited)

I'm not sure because I haven't looked at all the closures you're creating for the instructions, but it looks like the closures are not capturing any variables from their environment. In other words, they take a &Gameboy parameter and return u8, without using any other variables. If that's true, you can just use fn instead of Fn and then no boxing is needed.


Oh, nevermind, I see some of the closures reference other parameters such as a RegisterTarget. So you can't simply use fn. You probably knew that.

1 Like

You could use an enum instead of dyn Instruction.

3 Likes

Assuming that from_byte has no side effects that you care about, you can cache the result instead of reallocating every time, which will avoid the memory pressure but still dynamically dispatch:

use std::sync::OnceLock;

pub type Instruction = dyn Send+Sync+Fn(&mut gameboy::Gameboy) -> u8;

pub fn from_byte(byte: u8) -> &'static Instruction {
    const EMPTY: OnceLock<Box<Instruction>> = OnceLock::new();
    static CACHE: [OnceLock<Box<Instruction>>;256] = [EMPTY; 256];
    
    CACHE[byte as usize].get_or_init(|| _from_byte(byte))
}

fn _from_byte(byte: u8) -> Box<Instruction> {
    // Your current implementation here...
    unimplemented!()
}

An enum? How? Could you please clarify?

A lazy init! Great idea.
And I think that your example using Send+Sync was the missing piece for me to use the lazy_static crate. What do those mean btw?

As expected, there is negligible effect on the performance but it works! I can sleep better now.
Now I need to understand all that.

1 Like

Jumping in here, the basic idea would be to swap

for a method defined on an enum of all Instructions

enum Instruction {
    Nop,
    Stop,
    // ...
}

impl Instruction {
    fn run(&self, &mut gameboy: Gameboy) -> u8 {
        match self {
            Instruction::Nop => pai(nop::nop)(gameboy),
            Instruction::Stop => pai(misc::stop)(gameboy),
            // ...
        }
    }

    fn from_byte(byte: u8) -> Self {
        match byte {
            0x00 => Nop,
            0x10 => Stop,
            // ...
        }
    }
}

Then anywhere you're currently doing something like...

from_byte(0)(&mut gameboy)

You would do

Instruction::from_byte(0).run(&mut gameboy)

I don't know what pai(...) is doing, so it might be bad to call it every time the instruction runs? Worth considering I suppose.

1 Like

Thank you! That's clearer now. We match on self and then just call the appropriate function.
From a memory perspective, it doesn't seem better as we would allocate an Instruction at every step. Right?

pai (horrible name) means "PC advancing instruction". Most instructions would advance the PC by one so I refactored that common code into a decorating function that calls the original and advances the PC.

Couldn't you do a match on the byte and then actually call the function, something like this?:

impl Instruction {
    fn run(byte: u8, &mut gameboy: Gameboy) -> u8 {
        match byte {
            0x00 => { nop::nop(gameboy); pai() },
            0x10 => { misc::stop(gameboy); pai() },
            // ...
        }
    }

So misc::stop would not return a closure, it would perform the instruction. And pai would just advance the PC.

Or is there significant work done currently when creating the closure (other than the boxing) that would have to be repeated on every call with this approach?

I'm not sure if "allocate" is the right term here - You're putting a small (probably 1-4 bytes) enum value on the stack and immediately using it. I haven't seen the entire Instruction definition, but if there's no Box, there's no heap allocation or indirection.

I wouldn't be surprised if the optimiser inlined your Instruction::run() method and the value is entirely optimised out at runtime. Even if it can't be optimised away, the value is tiny so it'll almost certainly only ever live in a register, making it essentially free anyway.

4 Likes

They mean that the closure has to be thread-safe. This is required for values stored in static variables because there are no controls in place to force or verify that they’re only accessed from the main thread.

Nope. Perfectly valid solution!
Thank you!

1 Like

I see! Thank you, Michael.
There is one small thing in the way of this approach, some Instructions are parameterized and return a closure that holds onto this parameter.
It occurred to me that I can probably use a macro to generate the functions at compile time.
Would that make sense?

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.