Hello,
I am somewhat a beginner and hoped to learn rust by implementing a Lua runtime in rust. It is working, but much slower than the official implementation (lua5.1). Despite the byte code is not optimised (2 times longer than luac byte code), it is still 10 times slower.
Some implementation details:
- I added a context type to emulate the behaviour of blocks and closures in lua:
As a result, the variables are referenced by (offset, index), where offset represents the number of times we need to look up from current context.pub struct Context { pub(crate) locals: Vec<Value>, pub(crate) ancestors: Vec<Rc<RefCell<Context>>> }
- I did not pre-load the variables onto stacks, only load them and discard immediately, so even if the vm is register-based, it still behaves like stack, also leading to the longer byte code.
The loop body looks like the following, where:
1. set_stack and get_stack simply sets and refers to the value on stack
2. set_local and get_local simply looks up in current context or its ancestors
while let Some(bc) = compiled.byte_codes.get(pc) {
pc += 1;
match bc.clone() {
NOP => continue,
Halt => break,
PushContext => self.push_context(),
PopContext => self.pop_context(),
LoadInt | LoadNil | LoadBool => ..,
// i used a macro here to generate the pattern and match expr
// because macro cannot expand to match arms
bc @ (Not | BitNot | Neg | Len) => match bc {
...
}
// same here
bc @ (And | Or | Xor | ... ) => match bc {
}
GetLocal(dst, i, j) => {
let value = self.get_local(i, j)?;
self.set_stack(dst, value);
},
SetLocal(i, j ,src) => {
let value = self.get_stack(src).clone();
self.set_local(i, j, value)?;
},
LoadConst(dst, src) => {
let value = compiled.get_literal(src)?.clone();
self.set_stack(dst, value.into());
},
// Call Ret function table ommited
JmpNot(cond, forward) => {
let cond: bool = self.get_stack(cond).into();
if !cond {
pc = (pc as isize + forward as isize) as usize;
}
}
JmpWhen(cond, forward) => {
let cond: bool = self.get_stack(cond).into();
if cond {
pc = (pc as isize + forward as isize) as usize;
}
}
Jmp(forward) => {
pc = (pc as isize + forward as isize) as usize;
}
}
}
The measurement of running time of the match expr is as follows:
Variant | Count | Total(s) | Average(s) |
---|---|---|---|
GetLocal | 14116415 | 0.58360328 | 0.0000000413421734 |
SetLocal | 6385074 | 0.24584601 | 0.0000000385032358 |
LoadConst | 1000000 | 0.03518509 | 0.0000000351850910 |
LoadInt | 6385074 | 0.22127918 | 0.0000000346556947 |
Call | 2692538 | 0.11427579 | 0.0000000424416629 |
Add | 2346267 | 0.09794354 | 0.0000000417444153 |
Sub | 2692536 | 0.11267531 | 0.0000000418472808 |
Less | 3692537 | 0.13302776 | 0.0000000360261135 |
JmpNot | 3692537 | 0.10446466 | 0.0000000282907557 |
Jmp | 999999 | 0.02705205 | 0.0000000270520801 |
PushContext | 2692538 | 0.17783500 | 0.0000000660473501 |
PopContext | 2692538 | 0.14251488 | 0.0000000529295720 |
LoadFunc | 1 | 0.00000187 | 0.0000018700000000 |
Ret | 2692537 | 0.07430846 | 0.0000000275979353 |
And the lua code i used to test:
function fib(n)
if n < 2 then
return n
end
local n1 = fib(n-1)
local n2 = fib(n-2)
return n1 + n2
end
fib(30)
local x = 1
while x < 1000000 do
x = x + 1
end
What could possibly be wrong with my implementation? Can it be the way I used macros in the match expr? I would be very grateful if someone could give some feedback.
Generated byte code, in case useful:
my own: (LoadFunc(dst, len|nargs) loads the function onto the stack, with reference to current context. len is 13-bit, representing the length of the function(in byte code) and nargs is 3-bit, representing the number of arguments to the function)
Literals:
1000000
-------------------------------
Byte Codes:
PushContext
LoadFunc(1, 233)
PushContext
SetLocal(0, 0, 1)
GetLocal(1, 0, 0)
LoadInt(2, 2)
Less(1, 1, 2)
JmpNot(1, 3)
GetLocal(0, 0, 0)
PopContext
Ret
GetLocal(2, 2, 3)
GetLocal(3, 0, 0)
LoadInt(4, 1)
Sub(3, 3, 4)
Call(2, 1)
SetLocal(0, 1, 2)
GetLocal(2, 2, 3)
GetLocal(3, 0, 0)
LoadInt(4, 2)
Sub(3, 3, 4)
Call(2, 1)
SetLocal(0, 2, 2)
GetLocal(0, 0, 1)
GetLocal(1, 0, 2)
Add(0, 0, 1)
PopContext
Ret
LoadNil(0)
PopContext
Ret
SetLocal(1, 3, 1)
GetLocal(0, 1, 3)
LoadInt(1, 30)
Call(0, 1)
LoadInt(1, 1)
SetLocal(0, 0, 1)
GetLocal(0, 0, 0)
LoadConst(1, 0)
Less(0, 0, 1)
JmpNot(0, 5)
GetLocal(1, 0, 0)
LoadInt(2, 1)
Add(1, 1, 2)
SetLocal(0, 0, 1)
Jmp(-9)
PopContext
luac:
main </home/olivercwy/workspace/lua/bin/test_lua/test.lua:0,0> (14 instructions at 0x55dfe59accf0)
0+ params, 2 slots, 1 upvalue, 1 local, 2 constants, 1 function
1 [1] VARARGPREP 0
2 [9] CLOSURE 0 0 ; 0x55dfe59acfb0
3 [1] SETTABUP 0 0 0 ; _ENV "fib"
4 [11] GETTABUP 0 0 0 ; _ENV "fib"
5 [11] LOADI 1 30
6 [11] CALL 0 2 1 ; 1 in 0 out
7 [13] LOADI 0 1
8 [14] LOADK 1 1 ; 1000000
9 [14] LT 0 1 0
10 [14] JMP 3 ; to 14
11 [15] ADDI 0 0 1
12 [15] MMBINI 0 1 6 0 ; __add
13 [15] JMP -6 ; to 8
14 [16] RETURN 1 1 1 ; 0 out
function </home/olivercwy/workspace/lua/bin/test_lua/test.lua:1,9> (15 instructions at 0x55dfe59acfb0)
1 param, 4 slots, 1 upvalue, 3 locals, 1 constant, 0 functions
1 [2] LTI 0 2 0
2 [2] JMP 1 ; to 4
3 [3] RETURN1 0
4 [6] GETTABUP 1 0 0 ; _ENV "fib"
5 [6] ADDI 2 0 -1
6 [6] MMBINI 0 1 7 0 ; __sub
7 [6] CALL 1 2 2 ; 1 in 1 out
8 [7] GETTABUP 2 0 0 ; _ENV "fib"
9 [7] ADDI 3 0 -2
10 [7] MMBINI 0 2 7 0 ; __sub
11 [7] CALL 2 2 2 ; 1 in 1 out
12 [8] ADD 3 1 2
13 [8] MMBIN 1 2 6 ; __add
14 [8] RETURN1 3
15 [9] RETURN0