Did a quick and dirty comparison of factorial compared to lua, it is not that great atm since vm optimization has been less of a priority than language features.
factorial/gluon time: [13.705 us 13.812 us 13.932 us]
change: [+23.875% +25.569% +27.223%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
factorial/lua time: [1.4681 us 1.4778 us 1.4895 us]
change: [-4.4998% -1.9454% +1.1977%] (p = 0.17 > 0.05)
No change in performance detected.
Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) high mild
15 (15.00%) high severe
Part of the overhead is certainly because of bounds checking, some of that could be fixed without major changes without resorting wildly unsafe code (ala C) but some would need larger changes such as a bytecode validator. The function call preparation in gluon is also a bit more expensive in gluon than it should be which should be improved.
Most egregiously, I had to modify the gluon benchmark to directly use the primitive operators
#Int* . Using the normally, overload
+ etc implies two(!) extra function calls per operation. Without those changes gluon performs much worse still but once function inlining (which is the works, but blocked on salsa-rs/salsa#147) that change won't be necessary.