What is the biggest difference between Garbage Collection and Ownership?

Turns out that the compiler cannot figure it out at run time.

Let's say your program creates a bunch of pointers to some object that you have created and it saves those pointers in some other objects/data structures. Or perhaps it passes those pointers to some threads you spin up.

Now, the compiler could only ever automatically insert code to delete your object if it is totally sure none of those pointers is ever going to be used to reference the object again.

But how would it know that, it would have to totally understand the control flow of your program. It would have to know when your threads are going to terminate.

Not only is it very hard (impossible) to do such full program analysis at compile time often the control flow cannot even be known at compile time. Control flow depends on the data the program is using at run time.

This is why C and C++ programmers often makes mistakes by not identifying all the places in the code where data can be freed, a memory leak, or assuming it can be freed at some point only to find it is actually used again in some circumstances, a use after free. Even humans find this very difficult for large programs and especially when there are many programmers working on the code, all making their best guess at when it is safe to do something, or not.

2 Likes

I think some people are being a little unfair to GC. Good GCs aren't slow, I work on systems where we benchmark the GC at around 5% of runtime, so you couldn't speed things up that much.

Also, some coding styles are much easier with GC -- you can pass around large const objects (and trees of objects) without having to keep careful track of ownership. In Rust you would do that with Arc/Rc, but Rcing everything is slower than using a good GC.

One place where GCs are bad is memory usage -- they work best when you have twice as much physical memory as your program needs. Now for many uses that isn't the end of the world, physical memory is cheap compared to programmer time, so just shove some more RAM in there! But, I do think saving memory is good, and that's one place where Rust can outperform GCs.

2 Likes

It is not just the time spent doing GC, it is other overhead needed to keep track of pointers, and also that modern CPUs work best if the amount of working memory is minimised. So by re-cycling memory as early as possible, you will tend to minimise cache-misses.

Of course, GC languages do have advantages, but for system programming where you do not want to compromise on performance, non-GC languages such as C, C++, Rust tend to have the edge, and in terms of achieving this performance safely, Rust is almost unique ( there may now be other languages with a similar approach, Zig? ).

1 Like

The moving GC I work with doesn't "keep track of pointers" outside of the actual garbage collection, and allocation is very cheap. As free memory is one continuous piece of memory, so allocation is literally incrementing the pointer to the start of free memory, and checking if we should do a GC. Also, as it's a moving memory manager, we really squish everything together!

I'm happy to believe GCs are somewhat slower, but I see lots of people claim it frequently with little to no evidence.

For example, in the language shootout, Fortran (no GC) and Julia (GC) are almost identical in time taken -- Charts (Benchmarks Game) . Now Rust beats both, but it can't just be GC which makes it fast, else why does it also beat Fortran?

1 Like

I love this question because some of the best parts of Rust are just under the surface.

My background is also in scientific computing, mostly in C++ but with some Python scripting. Ownership sits much closer to the manual memory management side than it does to the GC side. Like in C++, you have to explicitly tell the compiler when you want to put a value on the heap. Unlike in C++ though, you don't have to tell the compile when to get rid of it. That's ownership.

Most people stop there, but this is where Rust starts to shine. In order for an ownership system to work, you need to know how long something will (and should) exist. This means you can ensure references are always valid (i.e. you can build a borrow checker). This prevents lots of things like reference aliasing and smuggling. Knowing that references are always valid does more than just eliminate bugs. It makes you less scared of your code and reduces the mental burden of changing your code. You can generate a memory management model, and the compiler will hold you to it.

There is only part of your original post that I want to point out,

I have the ability to avoid bugs by following strictly my own rules.

This is true, but you are perfect. None of us are. We will make mistakes. Ownership and the things that come with it ensure that we are consistent. Moreover, an ownership model that's uniformly enforced by the compiler makes integrating with other code bases much easier. You don't have to worry about how their memory management model works. If it doesn't align with your expectations/understanding, the compiler will let you know. It's also fairly clear in most function signatures.

If you find this interesting/convincing, I recommend checking out No Boilerplate on YouTube. Tress does a fantastic job of condensing these powerful ideas into interesting, well-made, ~10-minute videos.

6 Likes

I also wanted to point out that this is simply not true – sorry.

Your experience with small, domain-specific, throw-away, single-developer programs might tell you otherwise, but that is not nearly representative of the whole of software development in general.

Basically half a century has passed since C and C++ were invented, generations of professional programmers were brought up using better or worse teaching materials and practices, and yet, no high-quality, meticuously-developed, non-trivial C or C++ programs exist without at least a few CVEs attributable to UaFs or other kinds of memory corruption.

Dreamweaver had a famous 8k bug. OpenSSL was found to contain dozens of critical errors (you might have heard of Heartbleed). Apple's Security framework presented everyone's favorite "goto fail" vulnerability. The majority of iOS versions has been jailbroken to some degree, which usually involved one or more memory corruption in the kernel or some privileged system component. One of them was a rendering bug stemming from a buffer overflow in the more than widespread FreeType library. I don't think I need to go on – and these are only the issues I remember off the top of my head.

The myth of the sufficiently smart C or C++ programmer is a lie we need to stop telling ourselves.

10 Likes

Your experience with small, domain-specific, throw-away, single-developer programs might tell you otherwise, but that is not nearly representative of the whole of software development in general.

That is right, I don't use too many features of a language, so I lack a good understanding about the pros and cons between languages like a professional programmer. I am happy to know more from you guys though. Thanks.

1 Like

Interestingly, I just read this Jetbrains article, which shows the (generally considered to be excellent) .net GC collecting for over a third of the runtime, due to very simple and recommended practices generating dozens of gigabytes generating what I'd guess would be an easily less than 1GB PDF.

This is not to say that anyone, either the project or runtime, is doing anything wrong (you could make about half of the same mistakes in Rust just as easily): but just an example that GC can matter, so if you're going to profile, to try to test with large enough input to force the GC to actually do work for the application to continue - especially, as the article mentions, in interactive applications!

2 Likes

See also this great blog post from Steve Klabnik, co-author of the Rust programming book:

3 Likes

I was thinking about chickens and eggs...

If you have no garbage collector you cannot write one in a language that requires a garbage collector.

Now you could write your garbage collector in assembler but you can do anything in C that you can do in assembler (Apart from using processor specific instructions).

So we see that we can create a garbage collector in compiled languages, like C, but not in languages that already requires a garbage collector to work, like Python.

You can create a Python interpreter/runtime in C but you cannot create Python in Python.

We can use Rust in place of C of course.

1 Like

That is kind of moot or at least a very weird way of putting it. If you are implementing a new language, then surely that can be implemented in whatever existing language there is? You can write a new GC'd language in an existing GC'd language, but that doesn't make it any easier, because you still have to invent or at least implement a GC algorithm. The only time I can imagine this situation being actually realistic to some degree was when GC hasn't been invented yet, but that was a long time ago.

3 Likes

I was talking of chickens and eggs. As in "which came first?". As in a long time ago....

We can have C compilers written in C. But then the question is where did first C compiler come from, what language was it written in? Certainly it was not C [*]

I'm just adapting that chain of though to languages that use garbage collection. If you have no garbage collector then you cannot use a language that relies on garbage collection to write a garbage collector because you don't have garbage collector.

I agree the question is kind of moot. Unless there is a tribe of hackers that survive a nuclear apocalypse and are trying bootstrap a computing infrastructure from nothing, I guess.

Except that it does highlight a fundamental distinction between languages that rely on GC and those that do not. Which illuminates a fundamental difference between so called "systems programming languages" and other languages. Which perhaps helps answer the OP's question "Can you guys tell me what is the biggest difference between garbage collection and ownership?"

  • I guess the first C compiler could have been written in C. It would just require humans to do the compilation from the source they wrote themselves. Rather like back in the day when we had no access to a high level language we wrote programs in an ALGOL like pseudo code and then translated that to assembler manually.
1 Like

"The language definition presumes a garbage collector" is not the same as "it is impossible to run a program otherwise". I am not certain, but I expect this was done at least once in the history of bootstrapping Lisp — the language definition is simple, and there are stories of running 100%-leaky implementations (no deallocator, let alone a garbage collector) and getting useful work done anyway. It would be plausible to bootstrap from a minimal evaluator to a compiler and GC before running out of memory.

6 Likes

Some examples comparing Rust and C++ ownership to Go GC, with thoughts on the different kinds of safety they can guarantee: Safety - by Jeff Schwab - Deeply Nested

4 Likes

For what it's worth, the first C compiler was written in B. Specifically, a self-hosting B compiler (i.e., one written in B) was tweaked to make "New B", which became C. And before that was a bunch of other stuff like BCPL and TMG and Algol and Fortran.

And for the general chicken-and-egg question you probably intended, the serious answer is that waaaaaay back when "complers" for "higher-level" / non-assembly languages were a hip, shiny, new-fangled idea, those compilers were written in assembly language (or worse, raw machine code (yes, those are slightly different things)). I assume "compilers for assembly languages" / assemblers were written in machine code.

And of course a "compiler for machine code" is just a physical chip, so the paradox ends there. Or does it?

4 Likes

Assembling is straightforward enough that you can write your assembler in assembly, and hand-translate into machine code to bootstrap it.

AFAIK, is is possible to do something similar to slightly-more-higher-level-than-assembly languages, too. So, assuming that’s still feasible for – say – C, you can write your C compiler in C, and then hand-translate it into assembly. Or maybe use any mode of semi-automatic translation; it’s presumably a whole spectrum, all the way to fully automatic hand-translation aka a bootstrap compiler written in … well … something else. Maybe even on a different kind of machine (in which case it might be considered cross-compilation).

3 Likes

I also seem to remember at least one of Borland's "Turbo ..." compilers being designed to leak memory because it was more performant to just let DOS throw it all out at the end of program execution.

1 Like

Yes, that's a common strategy for compilers and other “batch” programs that process some input then exit, and can afford to occupy memory proportional to the input. Every string/symbol interner is also a memory leak.

An “arena allocator” (Rust example 1, 2) is the slightly fancier version of this where you deallocate an entire memory pool containing many objects earlier than program exit (at the price of needing to ensure that there aren't any references in to it at that point, which can be easy in Rust thanks to lifetimes).

4 Likes

Einstein said, among other smart things "In theory, practice is the same as theory".

I would add that, in my opinion, you are committing the classic premature-optimization error.

Consider the amount of Python code running in the world today. It's not only garbage-collected, it's interpreted. Heaven forbid! But in practice, most of this huge body of code is fast enough. And the development costs are favorable, because Python is easy to write and get working.

The same applies compiled garbage-collected languages, like Go, also very, very popular. Is it generally as fast as Rust? Not according to the benchmarks I've seen. Does that matter in most instances? Apparently not, because there is a lot of useful Go code running today.

It all boils down to using the right tool for the job. I am a now-retired software developer and manager, having had a 50-year career that included a lot of performance analysis work in operating systems. If I were faced with a development task that required very predictable latency and/or a small footprint, I would not choose a garbage-collected language and Rust would be high on my list, if I could find skilled people to write the code.

But for garden variety applications on today's incredible hardware (remember the Cray 1 "supercomputer"? It was clocked at 80 mhz), I would not consider Rust. The cost-benefit calculation just doesn't work when you consider the alternatives: Go, D, Nim and a personal favorite, Haskell. Rust's memory-safety without a garbage collector exacts a big price from programmers, who become part of the memory management system, frequently involuntarily. Just scan this forum and note the number of conversations started by utterly baffled people. Those people are spending time fighting with their tool instead of getting the job done. And for what? Additional performance that they probably don't need (you don't need a Ferrari to go to the grocery store)?

4 Likes

I don't have a strong opinion on that as I haven't tried it - I tend to agree, but I guess it would depend on the application and whether performance is a factor. I am using Rust for "system programming", with applications written in an interpreted language (pretty much SQL), implemented in Rust. Rust is working very well for me for this. Not just the memory management, also the safe concurrent programming, excellent performance, and many other features of Rust.

3 Likes