What is the biggest difference between Garbage Collection and Ownership?

It is not just the time spent doing GC, it is other overhead needed to keep track of pointers, and also that modern CPUs work best if the amount of working memory is minimised. So by re-cycling memory as early as possible, you will tend to minimise cache-misses.

Of course, GC languages do have advantages, but for system programming where you do not want to compromise on performance, non-GC languages such as C, C++, Rust tend to have the edge, and in terms of achieving this performance safely, Rust is almost unique ( there may now be other languages with a similar approach, Zig? ).

1 Like

The moving GC I work with doesn't "keep track of pointers" outside of the actual garbage collection, and allocation is very cheap. As free memory is one continuous piece of memory, so allocation is literally incrementing the pointer to the start of free memory, and checking if we should do a GC. Also, as it's a moving memory manager, we really squish everything together!

I'm happy to believe GCs are somewhat slower, but I see lots of people claim it frequently with little to no evidence.

For example, in the language shootout, Fortran (no GC) and Julia (GC) are almost identical in time taken -- Charts (Benchmarks Game) . Now Rust beats both, but it can't just be GC which makes it fast, else why does it also beat Fortran?

1 Like

I love this question because some of the best parts of Rust are just under the surface.

My background is also in scientific computing, mostly in C++ but with some Python scripting. Ownership sits much closer to the manual memory management side than it does to the GC side. Like in C++, you have to explicitly tell the compiler when you want to put a value on the heap. Unlike in C++ though, you don't have to tell the compile when to get rid of it. That's ownership.

Most people stop there, but this is where Rust starts to shine. In order for an ownership system to work, you need to know how long something will (and should) exist. This means you can ensure references are always valid (i.e. you can build a borrow checker). This prevents lots of things like reference aliasing and smuggling. Knowing that references are always valid does more than just eliminate bugs. It makes you less scared of your code and reduces the mental burden of changing your code. You can generate a memory management model, and the compiler will hold you to it.

There is only part of your original post that I want to point out,

I have the ability to avoid bugs by following strictly my own rules.

This is true, but you are perfect. None of us are. We will make mistakes. Ownership and the things that come with it ensure that we are consistent. Moreover, an ownership model that's uniformly enforced by the compiler makes integrating with other code bases much easier. You don't have to worry about how their memory management model works. If it doesn't align with your expectations/understanding, the compiler will let you know. It's also fairly clear in most function signatures.

If you find this interesting/convincing, I recommend checking out No Boilerplate on YouTube. Tress does a fantastic job of condensing these powerful ideas into interesting, well-made, ~10-minute videos.

6 Likes

I also wanted to point out that this is simply not true – sorry.

Your experience with small, domain-specific, throw-away, single-developer programs might tell you otherwise, but that is not nearly representative of the whole of software development in general.

Basically half a century has passed since C and C++ were invented, generations of professional programmers were brought up using better or worse teaching materials and practices, and yet, no high-quality, meticuously-developed, non-trivial C or C++ programs exist without at least a few CVEs attributable to UaFs or other kinds of memory corruption.

Dreamweaver had a famous 8k bug. OpenSSL was found to contain dozens of critical errors (you might have heard of Heartbleed). Apple's Security framework presented everyone's favorite "goto fail" vulnerability. The majority of iOS versions has been jailbroken to some degree, which usually involved one or more memory corruption in the kernel or some privileged system component. One of them was a rendering bug stemming from a buffer overflow in the more than widespread FreeType library. I don't think I need to go on – and these are only the issues I remember off the top of my head.

The myth of the sufficiently smart C or C++ programmer is a lie we need to stop telling ourselves.

10 Likes

Your experience with small, domain-specific, throw-away, single-developer programs might tell you otherwise, but that is not nearly representative of the whole of software development in general.

That is right, I don't use too many features of a language, so I lack a good understanding about the pros and cons between languages like a professional programmer. I am happy to know more from you guys though. Thanks.

1 Like

Interestingly, I just read this Jetbrains article, which shows the (generally considered to be excellent) .net GC collecting for over a third of the runtime, due to very simple and recommended practices generating dozens of gigabytes generating what I'd guess would be an easily less than 1GB PDF.

This is not to say that anyone, either the project or runtime, is doing anything wrong (you could make about half of the same mistakes in Rust just as easily): but just an example that GC can matter, so if you're going to profile, to try to test with large enough input to force the GC to actually do work for the application to continue - especially, as the article mentions, in interactive applications!

2 Likes

See also this great blog post from Steve Klabnik, co-author of the Rust programming book:

3 Likes

I was thinking about chickens and eggs...

If you have no garbage collector you cannot write one in a language that requires a garbage collector.

Now you could write your garbage collector in assembler but you can do anything in C that you can do in assembler (Apart from using processor specific instructions).

So we see that we can create a garbage collector in compiled languages, like C, but not in languages that already requires a garbage collector to work, like Python.

You can create a Python interpreter/runtime in C but you cannot create Python in Python.

We can use Rust in place of C of course.

1 Like

That is kind of moot or at least a very weird way of putting it. If you are implementing a new language, then surely that can be implemented in whatever existing language there is? You can write a new GC'd language in an existing GC'd language, but that doesn't make it any easier, because you still have to invent or at least implement a GC algorithm. The only time I can imagine this situation being actually realistic to some degree was when GC hasn't been invented yet, but that was a long time ago.

3 Likes

I was talking of chickens and eggs. As in "which came first?". As in a long time ago....

We can have C compilers written in C. But then the question is where did first C compiler come from, what language was it written in? Certainly it was not C [*]

I'm just adapting that chain of though to languages that use garbage collection. If you have no garbage collector then you cannot use a language that relies on garbage collection to write a garbage collector because you don't have garbage collector.

I agree the question is kind of moot. Unless there is a tribe of hackers that survive a nuclear apocalypse and are trying bootstrap a computing infrastructure from nothing, I guess.

Except that it does highlight a fundamental distinction between languages that rely on GC and those that do not. Which illuminates a fundamental difference between so called "systems programming languages" and other languages. Which perhaps helps answer the OP's question "Can you guys tell me what is the biggest difference between garbage collection and ownership?"

  • I guess the first C compiler could have been written in C. It would just require humans to do the compilation from the source they wrote themselves. Rather like back in the day when we had no access to a high level language we wrote programs in an ALGOL like pseudo code and then translated that to assembler manually.
1 Like

"The language definition presumes a garbage collector" is not the same as "it is impossible to run a program otherwise". I am not certain, but I expect this was done at least once in the history of bootstrapping Lisp — the language definition is simple, and there are stories of running 100%-leaky implementations (no deallocator, let alone a garbage collector) and getting useful work done anyway. It would be plausible to bootstrap from a minimal evaluator to a compiler and GC before running out of memory.

6 Likes

Some examples comparing Rust and C++ ownership to Go GC, with thoughts on the different kinds of safety they can guarantee: Safety - by Jeff Schwab - Deeply Nested

4 Likes

For what it's worth, the first C compiler was written in B. Specifically, a self-hosting B compiler (i.e., one written in B) was tweaked to make "New B", which became C. And before that was a bunch of other stuff like BCPL and TMG and Algol and Fortran.

And for the general chicken-and-egg question you probably intended, the serious answer is that waaaaaay back when "complers" for "higher-level" / non-assembly languages were a hip, shiny, new-fangled idea, those compilers were written in assembly language (or worse, raw machine code (yes, those are slightly different things)). I assume "compilers for assembly languages" / assemblers were written in machine code.

And of course a "compiler for machine code" is just a physical chip, so the paradox ends there. Or does it?

4 Likes

Assembling is straightforward enough that you can write your assembler in assembly, and hand-translate into machine code to bootstrap it.

AFAIK, is is possible to do something similar to slightly-more-higher-level-than-assembly languages, too. So, assuming that’s still feasible for – say – C, you can write your C compiler in C, and then hand-translate it into assembly. Or maybe use any mode of semi-automatic translation; it’s presumably a whole spectrum, all the way to fully automatic hand-translation aka a bootstrap compiler written in … well … something else. Maybe even on a different kind of machine (in which case it might be considered cross-compilation).

3 Likes

I also seem to remember at least one of Borland's "Turbo ..." compilers being designed to leak memory because it was more performant to just let DOS throw it all out at the end of program execution.

1 Like

Yes, that's a common strategy for compilers and other “batch” programs that process some input then exit, and can afford to occupy memory proportional to the input. Every string/symbol interner is also a memory leak.

An “arena allocator” (Rust example 1, 2) is the slightly fancier version of this where you deallocate an entire memory pool containing many objects earlier than program exit (at the price of needing to ensure that there aren't any references in to it at that point, which can be easy in Rust thanks to lifetimes).

4 Likes

Einstein said, among other smart things "In theory, practice is the same as theory".

I would add that, in my opinion, you are committing the classic premature-optimization error.

Consider the amount of Python code running in the world today. It's not only garbage-collected, it's interpreted. Heaven forbid! But in practice, most of this huge body of code is fast enough. And the development costs are favorable, because Python is easy to write and get working.

The same applies compiled garbage-collected languages, like Go, also very, very popular. Is it generally as fast as Rust? Not according to the benchmarks I've seen. Does that matter in most instances? Apparently not, because there is a lot of useful Go code running today.

It all boils down to using the right tool for the job. I am a now-retired software developer and manager, having had a 50-year career that included a lot of performance analysis work in operating systems. If I were faced with a development task that required very predictable latency and/or a small footprint, I would not choose a garbage-collected language and Rust would be high on my list, if I could find skilled people to write the code.

But for garden variety applications on today's incredible hardware (remember the Cray 1 "supercomputer"? It was clocked at 80 mhz), I would not consider Rust. The cost-benefit calculation just doesn't work when you consider the alternatives: Go, D, Nim and a personal favorite, Haskell. Rust's memory-safety without a garbage collector exacts a big price from programmers, who become part of the memory management system, frequently involuntarily. Just scan this forum and note the number of conversations started by utterly baffled people. Those people are spending time fighting with their tool instead of getting the job done. And for what? Additional performance that they probably don't need (you don't need a Ferrari to go to the grocery store)?

4 Likes

I don't have a strong opinion on that as I haven't tried it - I tend to agree, but I guess it would depend on the application and whether performance is a factor. I am using Rust for "system programming", with applications written in an interpreted language (pretty much SQL), implemented in Rust. Rust is working very well for me for this. Not just the memory management, also the safe concurrent programming, excellent performance, and many other features of Rust.

3 Likes

I very strongly disagree with this sentiment.

First of all, it is very unfortunate that you wouldn't consider using Rust for most software, by invoking the "hardware is fast enough" fallacy. It is exactly this attitude towards software engineering that results in slow, laggy, underperforming programs. Yes, not everything needs to be fast, but surely we want most of our software to be at least fast enough, so it's not annoying to use.

Since someone mentioned Python: at my day job as a data scientist, I use a lot of Python because that's what everyone else uses (and writes libraries for), and often, the performance just doesn't cut it. When working for biological data, for example, it wouldn't be a good use of my time to wait for a Python script to chew on gigabyte-sized genomic sequences for hours.

I know, in theory, I just have to install some library with an appropriate C extension, and it will be Fast Enough™. In practice, however, this is very often not the case. The SciPy ecosystem (NumPy, Pandas, etc.) have been around for decades, yet I recently encountered a situation where a simple in-memory join of two 1000-line indexed data frames took minutes for some weird reason (I didn't want to go down the rabbit hole of debugging it). That's simply not acceptable because my script then literally spends more time doing Pandas data scrambling than it takes for the preceding step (sequence alignment) to go through a 1.5 GB sequence file.

Oh man. This is a very strong, unfounded, and seriously flawed assertion, because you are completely ignoring the sampling bias. Of course if you read a forum, you will find people who are confused about something – that's why they go to the forum and ask questions! However, the majority of these people are beginners – not just at Rust, but at programming and memory management in general. And beginners being confused by something is not a good measure of how hard the thing actually is, to put it mildly. Not to mention the rest of the people: those who don't ask questions here, because they already know how to write correct Rust code.

I have written C++ and Haskell for about a decade before switching to Rust, and the memory management part was nothing new. When I got borrow checker errors, there was almost always a 5-minute workaround. There are so many things you can do with borrowck errors if you have some experience with software design. Reference counting, passing ownership, transforming to a callback-based interface, returning a Cow for conditionally allocating, the list goes on. Just because you are not aware of these techniques, you shouldn't assume that nobody else is, either.

20 Likes

There's this aspect but I'll argue that while true, it somewhat misses the point. What matters much more is efficiency i.e. how much energy is being used to accomplish a task.
The reasons it matters are:

  1. Energy is getting more expensive due to geopolitical developments. This isn't equally true for everyone but something increasingly worth considering, especially for people in Europe.
  2. Moore's Law is more or less dead, and as long as we're stuck with silicon lithographic chips, hardware-level performance developments have largely stalled¹. Sure, chips are still getting faster, but only at the expense of rapidly increasing TDPs. See Intel Gen13 and Ryzen 7000 for a good idea of what I mean.

Putting those 2 things together, the case for decreasing energy usage by cutting wasteful energy expenditure becomes stronger. This is a paper giving some idea of what the PL energy efficiency landscape looks like.
C, C++ and Rust often form the top 3.
Of course this is just one paper, but until literature shows up that nuances this picture, as someone living in Europe I know what I'm sticking with in terms of energy considerations, which are of course only 1 facet of a highdimensional picture for any given project.

And then there's the performance aspect. @H2CO3 made an excellent case for avoiding a "good enough" mindset w.r.t. performance. I have my own example to share in that area. I'm a member of what appears to be an endangered species: Emacs users.
I got used to it in my Lisp days, and nothing has convincingly persuaded me since (including VSCode). While Emacs has many flaws, most are easy enough to either fix properly or work around, courtesy of largely being written in Elisp and thus modifiable by a user, mostly while the program is running.
However, one flaw is baked in so deeply, and there is so much history, back compatibility and Deep Magick going on there that it seems fundamentally unfixable: performance.
Emacs was written mostly with portability rather than pure performance in mind². And for a long time this philosophy has served it well, it can still run on OSes that have long since become a chalk outline e.g. MS-DOS.
But with tooling becoming more advanced e.g. the entire LSP ecosystem, Emacs often can't keep up. On top of that I often use keyboard macros (one of Emacs' killer features) that allows a complex series of manipulations to be recorded and replayed at will. It's really useful for e.g. refactorings, but it also multiplies the performance issue to the point where I can simply wait 1 minute or 2 for my machine (16 cores, 48GB ram, so no slouch) to keep up.
Had Emacs been written with performance as a crucial value, this might not have been an issue.

¹ I'm aware of Apple Silicon, and it's a good example for what can be done between now and the event of a paradigm shift that would allow further exponential growth in this area.
² There are exceptions to this, e.g. the rendering code is written in C rather than Elisp exactly for performance reasons.

7 Likes