Reference Counting, Garbage Collection, and Rust

I don't think reference counting is a garbage collector. That does not fit any definition of "garbage collector" I have heard. For example wikipedia:

The garbage collector attempts to reclaim memory which was allocated by the program, but is no longer referenced; such memory is called garbage.

The way I look at it is that Rust does not just leave garbage lying around willy nilly whist it moves on to do something else, expecting something else to find that garbage and recycle it.

Rather memory is sent for recycling as soon as the last reference to it disappears. There is no garbage.

This is an important distinction I think. A garbage collector does its work in the background, garbage may accumulate for a long while before it is picked up. it may never be picked up.

The same Wikipedia article (right?) mentions reference counting as a possible garbage collection strategy: Garbage collection (computer science) - Wikipedia

That said, it's perhaps not the one people usually have in mind, sort of like how OOP usually implies inheritance.

5 Likes

There is also this (relatively unknown) bit of research: A Unified Theory of Garbage Collection (umich.edu).

And the case of Python's garbage collector: Garbage collector design (python.org)

The main garbage collection algorithm used by CPython is reference counting.

2 Likes

Yes, yes. But as far as I can make out that reference counting goes on in Python's garbage collector, which is part of it's run time system

Rust has no such run time system and no garbage collector. Whatever reference counting is going on is compiled into my code because I asked for it by using Rc or whatever. If I don't do that there is no reference counting anywhere.

Does it really matter if the ref counting is implicit or explicit? It's still collecting garbage either way!

Some people mean tracing GC when they use the term GC. Others mean any form of reclaiming unused memory including ref counting, and in fact this is the more correct technical definition. It isn't worth arguing about it, since the term is used in both ways. It's just a term. To avoid confusion you can say tracing GC or ref counting.

4 Likes

Steve Klabnik, 2018: Rust has a static garbage collector.

This isn't particularly on-topic w.r.t. async and arguing terms/semantics helps no one[1]. Rust doesn't have automatic dynamic garbage collection, which is what most people mean when they refer to garbage collection.


  1. Admission: I am not helping that fact with this post. ↩︎

5 Likes

What do you mean? Steve does not use the term "static garbage collector" anywhere in that article.

I fail to see how Rust can be said to have a garbage collector when all memory is acquired and released when I say so in my program. It does not hang around, like garbage, waiting for a garbage collector to deal with it. There is no garbage in Rust.

Perhaps you are right. So often over the decades I have seen terms that had pretty specific meanings about specific things degenerate into a wishy-washy label for, well, almost anything. Ah well.

Similar to the dichotomy between static and dynamic typing, I think that Rust is proving out a new niche in the concept of garbage collection. That is, historically, we think of GC as something dynamic, that is, it involves a run-time component that does stuff at runtime. However, the idea of automatic memory management doesn’t inherently mean that it has to execute at runtime.

2 Likes

Except there is no garbage collector in Rust. I think the following code proves that:

pub fn make_garbage() -> &'static mut [u32] {
    let v: Vec<u32> = vec![0; 1024];
    let bv = v.into_boxed_slice();
    Box::leak(bv)
}

fn main() {
    loop {
        let x = make_garbage();
    }
}

Here memory is repeatedly allocated on the heap and then all means of accessing it is forgotten. It becomes garbage. Eventually the program will consume all available memory and the program crash (or be killed by the OS I guess). Why? Because there is no garbage collector cleaning up stuff my program has forgotten about.

1 Like

The point is that RAII is a form of garbage collection. It's a form of static garbage collection, and leaking memory is intentionally opting-out from this.

My view is that with RAII there is no garbage in the first place. When something on the heap is suddenly forgotten about by my program when control flow goes out of the scope of its owner, for example, then that something is immediately freed. So when it is freed is entirely determined by how I write my code. It never becomes garbage hanging around the place. Therefore there is no need for a garbage collector.

I view RAII as a form of memory management, as opposed to explicit writing malloc/free or new/delete and so on. Memory management does not imply a garbage collector.

Yikes, that code overrides the Rust GC!! Should leak be an unsafe method? :wink:

Trust me, it's perfectly safe :slight_smile:

Sure, but there are some caveats:

  1. With Arc you might not be able to control which thread drops an item. Weak references hold on to the memory indefinitely.
  2. The memory is not necessarily released to the OS when you free and might never be released if the heap is sufficiently fragmented.
  3. Libraries such as crossbeam implement GC-like abstractions.

I don't feel that using Arc changes the situation. Sure I can no longer deterministically tell where or when items get dropped but there are only a finite number of places in my code where they can be, and I wrote them. I did not hand over responsibility to any garbage collector. Similarly for leaking memory with weak references. Actually the fact that memory can be leaked by cycles of such references indicates there is no garbage collector, that "garbage" is stuck in limbo, inaccessible forever.

I think what the memory allocator does between itself and the OS is an implementation detail. There may not even be an OS.

Certainly libraries like crossbeam deal with epochs and such like. I could do that in my own code. That is stuff written into the source I'm compiling, not some garbage collector behind the scenes.

It's possible to implement Box::leak in Java and C# too (either by stashing a reference in a static, or more directly by using their FFI bridge).

You may even pinpoint when that happened with GC term. Thankfully Wikipedia keeps history of these pages around. If you look on what GC meant in year 2001 you'll see two things:

  1. Reference counting wasn't considered a form of garbage collection by some punters.
  2. And it was clearly highted that reference counting has serious problems anyway.

If you recall the history you'll see why: these were times when managed code and GC were supposed to solve all the issues one may imagine and the fact that reference counting gives you the ability to get 99% of benefits that “true GC” gives you without the need to live in chains of managed runtime was very inconvenient for pushers of such runtimes.

Few years down the road… when it becomes obvious that world doesn't want to accept managed runtime OOP cool-aid… wording have changes. Now tracing GC is still the most common type of garbage collector, but reference counting is acceptable, too: For data structures which do not involve cycles of references, such as binary trees, it is often preferable to other forms of garbage collection

Few years later Escape analysis was added to the list to muddy the water some more.

In the end GC term was stretched enough to ensure that when people like @ZiCog or me say why they think industry-wide adoption of GC is trillion-dollar mistake (certainly more costly than famous billion dollar mistake!) one may always turn around and say “but ARC is GC, too, you don't understand what you are talking about!”

Note that when Apple, after brief attempt to play with GC dropped that abomination (it adopted that crazyness last, would make sense that it would drop it first) it's press-release clearly used term “Garbage Collection” as something that doesn't include ARC (otherwise phrase “Mac Apps That Use Garbage Collection Must Move to ARC” wouldn't make any sense), but today we have to agree that, sadly, GC notion was stretched to include ARC, too.

That's just sad truth which makes it harder to explain why GC is something you have to avoid.

Note, that IMNSO the biggest trouble of GC is not the pauses that it introduces into your system but it's invasiveness: it's viral and affects literally everything. For very dubious benefit.

It's the best choice for small amount of tasks but is pushed as if it's something that one have to use always, everywhere. No, that's just simply “bad idea”™.

4 Likes

I would say the trouble is that you end up with "interpreted" rather than "compiled" memory management, and of course interpreted languages are inevitably slower and more cumbersome if you want to go fast.

1 Like

A lot of these terms are categorical descriptions, rather than formalisms. At one point, "reference counting" was certainly considered a garbage collection strategy. You can see examples of this in papers like this one:

They avoid marking through all of virtual memory by controlling references to objects with reference counts (4,5), conventions about which objects can refer to which other objects (6), or tables of locations containing references to the interesting objects (7). The third technique is also used by the present work.

In context, this poses reference counting as one of several solutions to the problem of tracking which memory may need consideration when performing garbage collection activities. Other parts of the paper more thoroughly describe Moon's perspective on what those activities are, and they're compatible with modern intuitions.

On the other hand, "mark and sweep" tracing collectors won. Pretty much every system you could find today, which purports "garbage collection" as a capability, is using that term to mean tracing garbage collection of one stripe or another. You can see this usage in practice in, for example, the documentation for Python's gc module:

This module provides an interface to the optional garbage collector.

Probably inadvertently, but meaningfully, this poses the mark-and-sweep technique implemented in that module against the default strategy, which is reference counting (at least on CPython), which the author considers to be not a garbage collector. (Though, note that the latter link does describe reference counting as the implementation's "main garbage collection algorithm.")

Practically speaking, the term "garbage collection" means what the speaker intends it to mean, and it is incumbent on the speaker to ensure that their audience understands where they are drawing that boundary. I think it's reasonable to say that Rust "does not have garbage collection," because the dominant memory management strategy is static (through ownership) rather than dynamic (through runtime discovery of unused memory to reclaim), and I think most programmers would understand you if you said that.

2 Likes