Writing a VM in Rust, mutually recursive functions, memory leak

zeroexcuses · March 20, 2019, 12:40am

I am writing a VM in Rust. Suppose we allow mutually recursive functions, so at some point we have something like:

let
  f x y = ... refers to g ...
  g x y = ... refers to f ...

Then, in the bytecode of f, we must either point to g or point to some env that contains g.

Similarly,in the bytecode of g, we must either point to f or point to some env that contains f.

Now, suppose those language has a REPL.

Does this basically guarantee circular refs if we use Rc?

How do VMs written in Rust get around this issue?

Note: so far, we haven't even decided whether the VM language has a GC or not. All we have decided is that the VM language should (a) support mutually recursive functions and (b) have a REPL.

jethrogb · March 20, 2019, 12:43am

You can make each function definition an Rc, but each use an rc::Weak. Or you can store all definitions in an array and refer to them by index at use. There's many other ways as well.

zeroexcuses · March 20, 2019, 12:56am

Is there a list or a blog post somewhere? I'm happy to invest a day or two and understand all the known techniques with using Rc instead of GC.

Finn · March 20, 2019, 8:17am

Well, that turned out to be rather easy, vm.rs#L3474:

pub fn clear_at_exit(&self, gtab: Rc<RefCell<Map>>){
    self.delay.borrow_mut().push(gtab);
}

This prevents the additional branch in Weak pointer dereferencing. That alone might be a worthless micro-optimization, but weak pointers and indices (which could be considered as a kind of weak pointer) have different semantics than Rc. In my system a function holds shared ownership of its module and the gtab (table of global variables) therein.

zeroexcuses · March 20, 2019, 8:30pm

@Finn : Whoa, you've written an actual VM in rust. I hope you don't mind if I try to pick you brain a bit:

Imagine again we want to support the following language feature

let
  f x y = ... refers to g ...
  g x y = ... refers to f ...

We likely go with the following setup

pub enum Instr { ... }

pub struct FuncRepr {
  name: String,
  instrs: Vec<Instr>
}

pub struct Env {
  funcs: Map<String, FuncRepr>
}

So then we have to deal with the question of "how does f refer to g'.
Our options seem to be:

We can do Instr::OtherFunc<Rc<FuncRepr>> -- but that would cause a cycle. Thus, we need to indirectly refer to g or refer to it weakly. So the options seem to be:
WeakRc to G.
StrongRC to Env, then use String to refer to g.
WeakRC to Env, then use String to refer to g.

=====

The problem with (2) is that we can not just pass f around. We need to always keep track of the Env that defined f, g .. and make sure that Env stays alive. Otherwise, the WeakRc to g may end up pointing at nothing.

Similarly with (4), we still have to manually track the enclosing Env, as the WeakRc by itself is not going to keep it alive.

So then this leaves us with (3), where

each function has a StrongRc pointing to the Env
each Env has a Map of (String, StrongRc<FuncRepr>) pairs
functions refer to other functions via (StrongRc<Env>, String)

When we need to break all cycles / allow for Rc, we just clear all the maps (thus allowing all functions to be freed, and thus allowing all non-pointed-at Envs to be freed).

Does this get the high level bits right?

Finn · March 21, 2019, 11:21am

Yup, that gets to the point.

Of course there is nothing wrong with replacing the map with an array. For example, it does not make much sense to use a map for local variables or closure bindings.

And we know that at some point a RefCell must be placed in, unless some other design is exploited.

More detailed information about the design space of cyclic data structures:

Topic		Replies	Views
Repl, bytecode design, mutually recursive functions	2	304	February 7, 2022
[Solved] GC in rust for scheme interpreter	15	2437	January 12, 2023
Rc + weakRc based scheme interpreter in Rust?	1	337	January 12, 2023
Handling reference cycles with Drop instead of Weak code review	4	673	January 7, 2022
Has a language level "weak reference" ever been explored for Rust?	9	1532	January 12, 2023

Writing a VM in Rust, mutually recursive functions, memory leak

Related Topics