What is the biggest difference between Garbage Collection and Ownership?

I am a scientific computing programmer, I use c++ before. Most of my codes ware well tuned and only used by myself, so I have the ability to avoid bugs by following strictly my own rules. So, in the past, I didn't have a good understanding about garbage collection. Now when reading the advantages of Rust's ownership over GC, I didn't get it.
Can you guys tell me what is the biggest difference between garbage collection and ownership? Thanks.

1 Like

A GC is part of a runtime, and its job is bookkeeping all of your memory allocations (objects, functions, variables, etc.) in order to free the memory when these are no longer being used. In order to do that, the GC has to pause the execution of your program at regular intervals, which will always have a performance penalty when compared with languages with no GC.

Since Rust doesn't have a runtime a different solution was ideated, and that is Rust's ownership model. By keeping track of it, it's possible to know when an object is no longer being used and its memory can be freed.

3 Likes

Since you are familiar with C++: Rust's ownership is essentially the same basic idea as RAII in C++. When you create an object, the constructor acquires a resource (be it a memory allocation or anything else, such as a file handle, DB connection, etc.), and when the object goes out of scope, its destructor automatically relinquishes that resource. (Compared to C++, the better memory safety of Rust comes from: 1. tracking at compile time which pointers are valid for what region of the code, and 2. representing thread-safety in the type system explicitly.)

In contrast, garbage collection:

  • means that the compiler doesn't worry about when an object is valid, and objects don't have destructors. Instead, a runtime tracks reachability of objects dynamically, while the program is executing, and when some objects happen to be unused, they are freed;
  • consequently, it incurs an unconditional heap allocation on every object so that the runtime can keep stuff alive by-reference;
  • is non-deterministic (you can't tell when exactly a value will be freed);
  • is strictly less general, because it only concerns memory, not any other kind of resource.
9 Likes

Both garbage collection and Rust’s ownership model provide references: pointers with systematic guarantees that the pointee is valid, but they behave quite differently in the two systems. In a GC-based language, there are no restrictions on what you can do with references and the garbage collector will keep objects alive until some time after the last reference is dropped. Rust, on the other hand, lets you control what happens to “ordinary” objects freely¹ and instead restricts references to be short-lived so that they don’t cause problems.

Âą With a few caveats regarding things like thread safety and aliasing .

4 Likes

Performance. Garbage collection is inherently expensive.

Only situation where this might not be the case is a program that runs for a short time and garbage collector never needs to be called before the program exits.

But for long running programs ( typically servers ) garbage collection is a significant expense, and also can produce nasty pauses which the garbage collector runs.

Per wikipedia: Garbage collection (computer science) - Wikipedia

"A peer-reviewed paper from 2005 concluded that GC needs five times the memory to compensate for this overhead and to perform as fast as the same program using idealised explicit memory management."

3 Likes

Note also that generally "garbage collection" is understood to be a tracing garbage collector of some sort, that is, it starts from some set of known used references (such as those on the stack) and traces every reference in those recursively to find every still accessible allocation.

Historically, C++'s destructors freeing their own memory on dropping out if scope was also considered garbage collection, just not a tracing garbage collector, though that's pretty uncommon to see it called that now. The relevance here is they are both attempts to let the developer not worry about manually allocating and freeing memory.

The trade off is destructors are much simpler and have consistent (but not necessarily better!) performance than tracing garbage collection, but it also let's you invisibly free memory while still holding a reference into it, which allows Bad Things™ to happen.

Rust, instead of asking you to either manually free memory, risk getting caught by the invisible free done by a destructor, or have to pay the runtime cost of tracing garbage collection; provides the borrow checker, which is essentially "just" a type checker for taking references, and ensuring you don't hold onto them after it gets freed, so you get the simplicity and consistent performance of destructors (the Drop trait) with the safety of a tracing garbage collector.

The cost is that now you have to figure out what lifetime errors mean when you get it wrong, can't simply hold sibling memory references, and it can make generic types much more difficult. This is considered acceptable to Rust developers, as it means if the compiler isn't complaining (and there's no unsafe) your code isn't hiding some memory error (technically it cold be leaking memory, but that's still considered safe)

6 Likes

Thanks, moy2010, H2CO3, 2e71828, and geebee32!

From your description, I got the following feelings:

Can I use this metaphor to describe GC and ownership:

GC is like a garbage truck scans over the city at a regular frequency, and the time table of GC is not controlled by the programmer, it is a under-cover mechanism. For example, every morning, a garbage truck will come out and search for wastes and process them. One bad: No matter there is waste generated in the program or not, the truck will come out and search. Second bad: even if there are many garbages generated during the day, you cannot change the time table to twice a day, like morning and afternoon. You can never pick up the garbage immediately after it is generated, cause the time table of truck is not controlled by you.

Ownership is like a garbage truck designed for a certain object, and the time table of truck is defined by the programmer, and fixed at compile time. The compiler knows exactly when to send out truck to pick up a certain garbage, because any object got only one owner, when the owner go out of scope, it is the time to send out truck to process it. for example, programmer tells the compiler when will this object be created and destructed, compiler make the time table of going out & coming back of the garbage truck to process that ceratian memory.

The biggest difference is GC is running outof control of programmers, at it is runtime decision. While, ownership is compile time decided and controled by programmer.

Can I say that? Thanks.

If GC is like a Garbage Truck, Ownership wouldn't be a garbage truck. Think of it like one of those self-destruction letters used by spies in cartoons; once their content has been consumed, they will self destruct.

6 Likes

Thanks! Your description is a little bit different from others.

The Rust knowledge book mentioned 3 mechanisms, I would like to sum it up using my own language after reading your reply:

1: GC, is like a runtime recursively happening collector, which book keeping and scan the pointers and block on heap, and make sure it is destructed when no pointers pointing to it. GC determine when to free memory at runtime.
2: explicit create and free: you need book keeping pointers and block on heap in your mind, and make sure it is destructed in time, not destructed twice .etc.
3: ownership: the ownership rule set make sure the memory is destructed in time when it is out of scope, you just need to follow the rules to avoid the disadvantage of GC, and to avoid to keep those in your mind in explicit create and free method, which is quite impossible when program grows. And when you follow the ownerships, the compiler already know when to free the memory at compile time.

In this way, ownership is faster when running, and less-prone to error when coding.

Is that right? Thanks.

3 Likes

Thanks.

Generally, garbage collectors collaborate with the allocator. The “time table” is not fixed but depends on how much memory is being allocated — often there is a memory space specifically for newly-created objects, where the garbage collector runs only when that region is full, and copies all objects that are not unreferenced out of it. (This is part of what is called “generational” garbage collection.)

Second bad: even if there are many garbages generated during the day, you cannot change the time table to twice a day, like morning and afternoon. You can never pick up the garbage immediately after it is generated, cause the time table of truck is not controlled by you.

Often there is a function you can call to run the garbage collector explicitly. But this is usually not desirable because it reduces overall performance; its main value is in forcing destructors/finalizers to run and weak references to break, so that tests can confirm that they function as intended.

(Also, generational GCs are specifically designed to efficiently handle objects that are unreferenced right after they are created.)

The timing disadvantage of garbage collection is actually in that they need to do intermittent large amounts of work, so applications which care about timing (real-time control, video games, low-latency network servers) may experience undesirable GC pauses. Sometimes explicitly calling the GC (e.g. at the end of rendering each frame in a video game) can help, but it depends on the characteristics of the specific collector.

Also note that the ownership model (or reference counting as seen in many other languages) has its own timing disadvantages: for example, when you drop() a complex data structure made of many allocations, the structure must be traversed to deallocate each heap pointer within it. This can be a substantial hiccup when the structure was created incrementally but dropped all at once. (Neat trick I've heard of to avoid it when this is a problem: send the value to another thread which runs the drop.) Garbage collectors can bypass this by just forgetting the entire structure at once (the details depend on the GC algorithm chosen).

7 Likes

The ownership model is like a city with very responsible citizens. When the objects they own are worn out or used and need to be re-cycled, they take the object to the re-cycling centre.

The garbage collection model is like a city with bad citizens. When they have finished with something, they throw the rubbish out of the car window, they dump unwanted items all around the city, polluting the verges. This puts a heavy burden on the local authorities who want to keep the city clean, meaning a great deal of work has to be done each day to search for rubbish and garbage, even if there is none to be found.

:slight_smile:

18 Likes

I thought I'd add that in my opinion rusts model is the same as that of bug-free C++. Every object must know how to free anything that needs freeing. The only difference is that in rust, you aren't allowed to introduce the bugs that C++ allows, such as double free and use after free, and allocating pointers with new or malic that never get freed at all.

7 Likes

I'm a bit uncomfortable with the amount of value-judgement you're bringing in with this metaphor. Garbage collection is a fine solution to many problems, and can sometimes be more efficient than other options. Unlike physical trash, the disposition of memory allocations has no negative externalities, and all available strategies should not be scorned as long as they are efficient and don't leak.

12 Likes

I don't know how do you think.

My feeling about c++ is that: it provided many features, which are not advocated by the language itself, for example, new.

I don't know why, maybe it is a historical reason. Many features have been used by the existed codes, but they are found to be bad later. These features are kept for compatibility. So, when I started to learn c++, I can always see some sentence in c++ books like this: you can do XXX with ZZZ, but you should try not to in any circumstance..., the better solution is YYY...

I guess it is because c++ is the first-out language in its kind. So it did spend some time for people to find out which is good to use, and which is bad to avoid. So people come up with a bunch of books of talking about coding style, and patterns. I guess no other languages have that much those kinds of books. I am not sure if I am right or not. Thanks.

Thanks.

Actually, what I cannot understand is why GC need spend extra resources to deal with the unused data. Even there are multiple references or pointers on a certain variable like in C++, the c++ compiler can decide when to destruct it at compile time, right? It is just when the last reference or pointer is gone, right?

Especially, when you said about GC algorithm. I know nothing about those underground mechanisms, but I am happy to know some. I guess all compilers are evolving, right? Would the difference between C++ and Rust disappear somehow in the future?

This can be a substantial hiccup when the structure was created incrementally but dropped all at once.

You let me know a thing to avoid. Is that because the memory address used are not connected together?

GC means periodically checking the reachability of objects at runtime. That's actual, runtime work (following pointers, performing conditional jumps, etc.) in addition to the work that's actually performed by freeing allocated memory. When static analysis can already prove at compile time when stuff needs to be freed, then calls to destructors can just be inserted in the code upfront, without needing to trace pointers at runtime just to find out what needs to be freed.

It sounds to me like you think C++ works with garbage collection – it doesn't. It also doesn't perform reference counting automatically (although you can explicitly opt into ref-counting via std::shared_ptr). Apart from this: no, differences between C++ and Rust will certainly not disappear in the future. They are completely different languages, with somewhat different goals, and C++ will likely never be able to get rid of its unsafe legacy. Rust doesn't have to care about any of that.

4 Likes

No, both in C++ and in Rust, it's the user who decides when to destruct. This decision is often implicit because destructor calls are automatically inserted at the end of variable scopes. The difference between C++ and Rust is only that Rust let's the user decide less freely when/how to (access and) destruct their values, so that use-after-free bugs and double-free bugs can be prevented.


In garbage collected languages its also not the compiler at compile-time deciding when things are destroyed/de allocated (except perhaps as an optimization in easy cases), but it's the garbage collector at run-time searching through all data in memory to see which data is or isn't reachable anymore.

One of the simplest form of garbage collection would be "reference counting" and that's actually something that Rust offers too, as a library type (or two) Arc<T> (and Rc<T>. These have problems that reference cycles can lead to memory leaks; more sophisticated garbage collectors have a way of dealing with (or detecting) such cycles so they don't pose a problem anymore; typical garbage collectors are "tracing" garbage collectors that will traverse the entire heap to see what is and isn't reachavbel anymore (from some appropriately defined set of entry points) and either deletes all unreachable data ("mark and sweep") or copies all reachable data (and then allows the memory where all data was previously located to be used again).

Such a more sophisticated (i. e. a tracing) garbage collector has usually quite tight language integration:

  • in order to be able to traverse the heap, all objects must have some similar / self-describing layout, so the garbage collector can, at runtime, find and follow all the pointers
  • in order to be able to determine a good set of entry points, the garbage collector must be able to identify which pointers to the heap are currently held on the stack and in global variables (because those need to be kept alive)

These are restrictions to the way a language is implemented that Rust doesn't need to follow; Rust (similar to e. g. C and C++) is so low-level that it would not be possible in general for the runtime to identify pointers to the heap on the stack or to recursed through the heap and find all the sizes and contained pointers of all structs along the way.


Its easy to imagine that garbage collection has some significant overhead. Not only is the task of regularly traversing all the heap memory quite significant, there's also the question of when garbage collection happens (and whether and for how long it needs to stop execution of all other code in the program while it's running), and there can be overhead in ensuring the properties listed above, of a somewhat uniform object layout, and a way to find/track stack variables (and global variables).

(Traversing) garbage collection can have performance benefits, too, though:

  • Copying a pointer is very cheap, cheaper than if you're reference-counting. Lots of (temporary) pointer copies can lead to lots of reference counter updates and significant overhead in extreme cases, when using reference-counting.

  • A copying garbage collector (i. e. not mark and sweep) can have the effect the data that points to each other will end up closer in the heap. E.g. when you're working with some (long-lived) linked lists, a copying garbage collection cycle will typically move the list items to be all right next to each other, which helps improve cache locality of subsequent usage of the list.

  • It's easier to write lock-free data structures in a garbage collected language, and lock-free data structures can be more performant than locking ones.

10 Likes

Turns out that the compiler cannot figure it out at run time.

Let's say your program creates a bunch of pointers to some object that you have created and it saves those pointers in some other objects/data structures. Or perhaps it passes those pointers to some threads you spin up.

Now, the compiler could only ever automatically insert code to delete your object if it is totally sure none of those pointers is ever going to be used to reference the object again.

But how would it know that, it would have to totally understand the control flow of your program. It would have to know when your threads are going to terminate.

Not only is it very hard (impossible) to do such full program analysis at compile time often the control flow cannot even be known at compile time. Control flow depends on the data the program is using at run time.

This is why C and C++ programmers often makes mistakes by not identifying all the places in the code where data can be freed, a memory leak, or assuming it can be freed at some point only to find it is actually used again in some circumstances, a use after free. Even humans find this very difficult for large programs and especially when there are many programmers working on the code, all making their best guess at when it is safe to do something, or not.

2 Likes

I think some people are being a little unfair to GC. Good GCs aren't slow, I work on systems where we benchmark the GC at around 5% of runtime, so you couldn't speed things up that much.

Also, some coding styles are much easier with GC -- you can pass around large const objects (and trees of objects) without having to keep careful track of ownership. In Rust you would do that with Arc/Rc, but Rcing everything is slower than using a good GC.

One place where GCs are bad is memory usage -- they work best when you have twice as much physical memory as your program needs. Now for many uses that isn't the end of the world, physical memory is cheap compared to programmer time, so just shove some more RAM in there! But, I do think saving memory is good, and that's one place where Rust can outperform GCs.

2 Likes