I am working on a Scheme implementation written in Rust. This is mostly for educational purposes, but I want the interpreter to be able to be embedded in Rust programs to provide scripting.
Interpreters written in C often use computed gotos for performance; this can be as much as a factor of 2 I believe. Can Rust achieve similar performance?
NO EXECUTE! September 3, 2008 has some interesting tricks for writing fast interpreter loops. That said, I'm not sure it matters for a scheme interpreter; allocating memory is a lot more expensive than a branch misprediction.
The LuaJIT source code is very interesting. The VM is built using a Lua based preprocessor to generate the hand tuned assembler code for the bytecode interpreter. See buildvm.
I've lost the source since it was just me playing around on playpen, but it is possible to build an interpreter like this, with a function for each bytecode instruction or whatever have you.
There are some caveats to keep in mind:
to get tail calls, you can't have any variables or temporaries destructors on the stack, that have to run after the call at the end of your function
you can't pass around types larger than a pointer (or twice a pointer, if you switch to C ABIs) because they might get a copy on the stack that LLVM cannot always eliminate, but you can pass around pointers to a common state structure or something similar
you need to keep all of your arguments in registers, on x64/linux, you get a maximum of 6 pointer/integer arguments, and the jmp becomes a call with 7 arguments; this might be impossible on x86, or you might need to use a different calling convention
Generally, if you keep an eye on the resulting assembly, you can get pretty good results, but it's a bit of an art.
Today at -O3 a match statement where the branches call methods that are inlined will be compiled into a computed jump. Inlining can be forced with #[inline(always)].