[Solved] GC in rust for scheme interpreter

anon80458984 · February 20, 2019, 3:16am

I am writing a toy scheme impl in Rust.
I need a GC for the scheme VM.
I do NOT need a GC for Rust. I'm happy with Rust's references + RC + ARC.
I do need to either build my own GC for the scheme VM ... or use some library.
What are sources I should look into?

vorner · February 20, 2019, 8:20am

I don't know how finished this one is, but there's a series of blog posts about design of one: Shifgrethor I: Garbage collection as a Rust library

newbie101 · February 25, 2019, 11:46am

If you are writing a toy interpreter its worth considering no GC at all -- just malloc without free.

Unless you enjoy writing garbage collectors its time consuming to program and not very useful on modern computers. Most toy interpreters will only allocate a few KB and even if they go wild and allocate 100MB its still less than most GUI applications.

Its also worth looking at the source code of ketos which has some nice patterns for implementing a lisp in rust.

eaglgenes101 · February 25, 2019, 1:48pm

You could consider a garbage collection scheme like Python's, where refcounting is accompanied by periodic sweeps of memory space to make sure that reference cycles don't last forever (this can be tracked by maintaining weakrefs to all memory allocated on behalf of the runtime, and cycles collected by changing one of the strong references in the cycle to Option::None). It'd be slow compared to other algorithms, but if you're just doing a toy project with no intention of releasing a competitive language to the outside world, I don't think performance is enough of a concern to warrant mucking around with unsafe.

If you want to challenge yourself with unsafe, a good candidate for that would be Cheney's algorithm.

jschievink · February 25, 2019, 2:17pm

I actually implemented the refcount+cycle collection method in Rust: https://github.com/jonas-schievink/rcgc

What makes this interesting is that it uses completely safe Rust (even in the presence of bugs in the GC, the worst that can happen is a panic when accessing an already-collected object).

No guarantees for correctness, though.

anon80458984 · February 26, 2019, 7:59pm

@jschievink : This is a very interesting approach. Is the following correct:

There is a root "GC Manager" object.
GC_Manager has a RC to all managed objects.
Weak's are handed out. Which are promoted to RC (guaranteed to succeed since GC_Manager holds a RC).
To do a "collect", we the GC_Manager runs through all RCs and drops the ones whose Weak account is 0.
Tracing is used to handle loops. I don't understand this aprt. Can you elaborate ?
This breaks RAII for Managed Objects, since objects are NOT dropped when the last Weak is gone, but only dropped when the GC_Manager does a collect.

jschievink · February 26, 2019, 8:23pm

Yes, that seems correct. Although RAII is pretty much always at odds with GC, that's not specific to this implementation.

This is just a normal tracing GC that starts at the root objects (which are Rcs the GC can also hand out), and recursively follows references to other managed objects. All objects that are reached in this process are marked, and once it's done it deletes all objects that weren't marked.

This is needed because the objects can point to each other in a way that creates a reference cycle. Cycles mean that the weak reference count in all objects in the cycle never drops to 0, so they wouldn't be freed.

anon80458984 · February 26, 2019, 8:53pm

There is something very simple I am not understanding.
"Tracing GC" is basically just the 'mark' stage of am mark&sweep right?
To do mark & sweep, we need every object to have a reacahble_from_root? bool. Then, for every "mark & sweep", we set this bool to false for objects, run a DFS or BFS on root objects, marking all reachable objects as true, and kill all unreachable objects.
It app;ears this bool is stored at https://github.com/jonas-schievink/rcgc/blob/master/src/lib.rs#L117
It appears we are doing a DFS at: https://github.com/jonas-schievink/rcgc/blob/master/src/lib.rs#L97-L105

====

Now, here is the part what I am confused about. The GC needs to track objects of DIFFERENT types.

We can have a Foo with a Rc<Bar>. We can have a Bar with a Vec<Rc<Cat>>.

So it seems that intuitively, we need every "GC-able" struct to have some function that says: "return a vec/iter of all my children that are Rc".

I can't find this part of the code. What am I misunderstanding? How does this work?

jschievink · February 26, 2019, 9:22pm

AFAIK Mark&Sweep is a specific way of implementing a Tracing GC. A copying GC would be another way to implement it. This is just a simple Mark&Sweep stop-the-world collector though.

Not in this implementation. The GC is generic over the type of object it manages (struct Gc<T: Trace>). T can be an enum though.

Here we only deal with Rooted<T>, which is essentially a wrapper around an Rc<T>.

Yes, exactly. This is what the Trace trait is for. Its trace method has to be implemented by an object manged by the GC, and it has to pass all Handle<T>s it owns to a Tracer object. An example implementation of this can be found in the tests: https://github.com/jonas-schievink/rcgc/blob/master/src/lib.rs#L373-L377

anon80458984 · February 26, 2019, 9:49pm

I'm sorry, I'm really confused now. Suppose I currently have:


pub struct Person {};
pub struct Thing {};

pub struct Village {
  chief: Rc<Person>,
  familys: Vec<Rc<Family>>,
}

pub struct Family {
  members: Vec<Rc<Person>>,
  stuff: Vec<Rc<Thing>>,
}

How would I rewrite this?

jschievink · February 26, 2019, 10:04pm

With my crate you'd have to do something like this:

enum Object {
    Person,
    Thing,
    Village {
        chief: Handle<Object>,
        families: Vec<Handle<Object>>,
    },
    Family {
        members: Vec<Handle<Object>>,
        stuff: Vec<Handle<Object>>,
    },
}

Of course you lose a lot of type-safety this way, so there might be better API designs. It's perfect for implementing dynamically-typed languages though.

anon80458984 · February 26, 2019, 10:20pm

Thanks for taking your time to answer all my questions!
I think we're almost done -- so the key idea I was missing is: (1) we decide the types that need to be GCed (2) we wrap them all in a big enum. Now, a "single type GC" is no longer a problem, though we lose some type safety (as we can now label a "Family" under the stuff field). Is this correct?

jschievink · February 26, 2019, 10:41pm

Yeah, that's right. A different GC design might be able to work with multiple different types, as long as they provide some sort of tracing interface, but I chose this because it was simple to make it work.

OptimisticPeach · February 27, 2019, 12:32am

Well, if you don't really mind the small drawback in performance, then you could probably change it to be a GC over a Box<dyn Trace> so that it would allow for any type. This would also be similar to how Java and C# (That I know of, I don't know about other languages) automatically box their System.Object deriving classes up.

anon80458984 · February 27, 2019, 1:25am

@OptimisticPeach : I felt something like this was possible (i.e. generic via Traits), couldn't figure out how to do it, and TIL: Box<dyn > solves it

Topic		Replies	Views
Rc + weakRc based scheme interpreter in Rust?	1	382	January 12, 2023
Try out rust-gc, a garbage collector for Rust! announcements	8	2822	January 12, 2023
RFC: Pauseless Concurrent Garbage Collector announcements	11	6246	January 12, 2023
Loop detection in rust	2	575	January 12, 2023
Writing a VM in Rust, mutually recursive functions, memory leak	6	1404	January 12, 2023

[Solved] GC in rust for scheme interpreter

Related topics