-
I am writing a toy scheme impl in Rust.
-
I need a GC for the scheme VM.
-
I do NOT need a GC for Rust. I'm happy with Rust's references + RC + ARC.
-
I do need to either build my own GC for the scheme VM ... or use some library.
-
What are sources I should look into?
I don't know how finished this one is, but there's a series of blog posts about design of one: Shifgrethor I: Garbage collection as a Rust library
If you are writing a toy interpreter its worth considering no GC at all -- just malloc without free.
Unless you enjoy writing garbage collectors its time consuming to program and not very useful on modern computers. Most toy interpreters will only allocate a few KB and even if they go wild and allocate 100MB its still less than most GUI applications.
Its also worth looking at the source code of ketos which has some nice patterns for implementing a lisp in rust.
You could consider a garbage collection scheme like Python's, where refcounting is accompanied by periodic sweeps of memory space to make sure that reference cycles don't last forever (this can be tracked by maintaining weakrefs to all memory allocated on behalf of the runtime, and cycles collected by changing one of the strong references in the cycle to Option::None
). It'd be slow compared to other algorithms, but if you're just doing a toy project with no intention of releasing a competitive language to the outside world, I don't think performance is enough of a concern to warrant mucking around with unsafe.
If you want to challenge yourself with unsafe, a good candidate for that would be Cheney's algorithm.
I actually implemented the refcount+cycle collection method in Rust: https://github.com/jonas-schievink/rcgc
What makes this interesting is that it uses completely safe Rust (even in the presence of bugs in the GC, the worst that can happen is a panic when accessing an already-collected object).
No guarantees for correctness, though.
@jschievink : This is a very interesting approach. Is the following correct:
-
There is a root "GC Manager" object.
-
GC_Manager has a RC to all managed objects.
-
Weak's are handed out. Which are promoted to RC (guaranteed to succeed since GC_Manager holds a RC).
-
To do a "collect", we the GC_Manager runs through all RCs and drops the ones whose Weak account is 0.
-
Tracing is used to handle loops. I don't understand this aprt. Can you elaborate ?
-
This breaks RAII for Managed Objects, since objects are NOT dropped when the last Weak is gone, but only dropped when the GC_Manager does a collect.
Yes, that seems correct. Although RAII is pretty much always at odds with GC, that's not specific to this implementation.
This is just a normal tracing GC that starts at the root objects (which are Rc
s the GC can also hand out), and recursively follows references to other managed objects. All objects that are reached in this process are marked, and once it's done it deletes all objects that weren't marked.
This is needed because the objects can point to each other in a way that creates a reference cycle. Cycles mean that the weak reference count in all objects in the cycle never drops to 0, so they wouldn't be freed.
-
There is something very simple I am not understanding.
-
"Tracing GC" is basically just the 'mark' stage of am mark&sweep right?
-
To do mark & sweep, we need every object to have a reacahble_from_root? bool. Then, for every "mark & sweep", we set this bool to false for objects, run a DFS or BFS on root objects, marking all reachable objects as true, and kill all unreachable objects.
-
It app;ears this bool is stored at https://github.com/jonas-schievink/rcgc/blob/master/src/lib.rs#L117
-
It appears we are doing a DFS at: https://github.com/jonas-schievink/rcgc/blob/master/src/lib.rs#L97-L105
====
Now, here is the part what I am confused about. The GC needs to track objects of DIFFERENT types.
We can have a Foo with a Rc<Bar>
. We can have a Bar with a Vec<Rc<Cat>>
.
So it seems that intuitively, we need every "GC-able" struct to have some function that says: "return a vec/iter of all my children that are Rc".
I can't find this part of the code. What am I misunderstanding? How does this work?
AFAIK Mark&Sweep is a specific way of implementing a Tracing GC. A copying GC would be another way to implement it. This is just a simple Mark&Sweep stop-the-world collector though.
Not in this implementation. The GC is generic over the type of object it manages (struct Gc<T: Trace>
). T
can be an enum though.
Here we only deal with Rooted<T>
, which is essentially a wrapper around an Rc<T>
.
Yes, exactly. This is what the Trace
trait is for. Its trace
method has to be implemented by an object manged by the GC, and it has to pass all Handle<T>
s it owns to a Tracer
object. An example implementation of this can be found in the tests: https://github.com/jonas-schievink/rcgc/blob/master/src/lib.rs#L373-L377
I'm sorry, I'm really confused now. Suppose I currently have:
pub struct Person {};
pub struct Thing {};
pub struct Village {
chief: Rc<Person>,
familys: Vec<Rc<Family>>,
}
pub struct Family {
members: Vec<Rc<Person>>,
stuff: Vec<Rc<Thing>>,
}
How would I rewrite this?
With my crate you'd have to do something like this:
enum Object {
Person,
Thing,
Village {
chief: Handle<Object>,
families: Vec<Handle<Object>>,
},
Family {
members: Vec<Handle<Object>>,
stuff: Vec<Handle<Object>>,
},
}
Of course you lose a lot of type-safety this way, so there might be better API designs. It's perfect for implementing dynamically-typed languages though.
-
Thanks for taking your time to answer all my questions!
-
I think we're almost done -- so the key idea I was missing is: (1) we decide the types that need to be GCed (2) we wrap them all in a big enum. Now, a "single type GC" is no longer a problem, though we lose some type safety (as we can now label a "Family" under the stuff field). Is this correct?
Yeah, that's right. A different GC design might be able to work with multiple different types, as long as they provide some sort of tracing interface, but I chose this because it was simple to make it work.
Well, if you don't really mind the small drawback in performance, then you could probably change it to be a GC over a Box<dyn Trace>
so that it would allow for any type. This would also be similar to how Java and C# (That I know of, I don't know about other languages) automatically box their System.Object
deriving classes up.
@OptimisticPeach : I felt something like this was possible (i.e. generic via Traits), couldn't figure out how to do it, and TIL: Box<dyn >
solves it