I am writing a VM in Rust. Suppose we allow mutually recursive functions, so at some point we have something like:
let
f x y = ... refers to g ...
g x y = ... refers to f ...
Then, in the bytecode of f, we must either point to g or point to some env that contains g.
Similarly,in the bytecode of g, we must either point to f or point to some env that contains f.
Now, suppose those language has a REPL.
Does this basically guarantee circular refs if we use Rc?
How do VMs written in Rust get around this issue?
Note: so far, we haven't even decided whether the VM language has a GC or not. All we have decided is that the VM language should (a) support mutually recursive functions and (b) have a REPL.
You can make each function definition an Rc, but each use an rc::Weak. Or you can store all definitions in an array and refer to them by index at use. There's many other ways as well.
This prevents the additional branch in Weak pointer dereferencing. That alone might be a worthless micro-optimization, but weak pointers and indices (which could be considered as a kind of weak pointer) have different semantics than Rc. In my system a function holds shared ownership of its module and the gtab (table of global variables) therein.
So then we have to deal with the question of "how does f refer to g'.
Our options seem to be:
We can do Instr::OtherFunc<Rc<FuncRepr>> -- but that would cause a cycle. Thus, we need to indirectly refer to g or refer to it weakly. So the options seem to be:
WeakRc to G.
StrongRC to Env, then use String to refer to g.
WeakRC to Env, then use String to refer to g.
=====
The problem with (2) is that we can not just pass f around. We need to always keep track of the Env that defined f, g .. and make sure that Env stays alive. Otherwise, the WeakRc to g may end up pointing at nothing.
Similarly with (4), we still have to manually track the enclosing Env, as the WeakRc by itself is not going to keep it alive.
So then this leaves us with (3), where
each function has a StrongRc pointing to the Env
each Env has a Map of (String, StrongRc<FuncRepr>) pairs
functions refer to other functions via (StrongRc<Env>, String)
When we need to break all cycles / allow for Rc, we just clear all the maps (thus allowing all functions to be freed, and thus allowing all non-pointed-at Envs to be freed).
Of course there is nothing wrong with replacing the map with an array. For example, it does not make much sense to use a map for local variables or closure bindings.
And we know that at some point a RefCell must be placed in, unless some other design is exploited.
More detailed information about the design space of cyclic data structures: