Would a good optional GC (on a reference bases) be a good thing to Rust?

Hello,

Today I was thinking on the Q&A of the live that Jon Gjengset made yesterday and at some point he came out with this really good idea of making a video series in his Youtube channel about the implementation of all the algorithms in Rust in the third Edition of the book:

The Algorithm Design Manual, 3th Edition
by Steven S. Skiena

This appears at 1:25:43

Jon Gjengset - 2022-01-01 Q&A/AMA/Whatchamacallit

This would be a fantastic way to share the knowledge of how to implement high performance data structures in Rust, by one of it’s highest knowledge persons. And we all also gained several implementations of many data structures in Rust. That all sound really interesting and good thing!
That for pedagogic porposes should never be implemented with a GC.

But that lead me to ask my self a different question, some of that implementations in Rust will go really wild, and that is good because it will teach a lot to the community, but realistically and on a practical aspect would Rust have to gain with the possibility of having a really good GC (GO kind of fast GC with low pauses, and on a per reference basis) just for some hairy parts of specific algorithms. In a way that you could do all implementation with non GC but in the most difficult cases you would have a GC to fallback into and continue with safe Rust code, does this kind of thinking makes any sense to all you seasoned Rust developers? Could you share your opinions?

Thank you!

When I searched for GC in crates.io I have found the following GC crates based on most downloads:

Crossbeam Epoch
Epoch-based garbage collection

gc
Tracing garbage collector plugin for Rust.

gc-arena
Safe garbage collected arenas

But I think that non of this is on pair (mature) with a GC like the GO GC.

Best regards and have a nice 2022,
Joao

1 Like

No, having a fully general tracing GC built into the language (which I assume is what your question is about) is basically completely incompatible with the expectations of Rust code having been written in the ecosystem during the last several years. If you find it easier to implement a specific data structure by doing some form of GC, you can do that internally with some of the cited crates, but to the outside world, it still pretty much has to look like an RAII-obeying, exclusive-mutability, borrow-checked, regular type.

The reality is that in the overwhelming majority of the time, one simply doesn't need the performance associated with "hairy" algorithms/data structures outside the standard library, and so one should just implement their data structures with safe code and safe abstractions. And in the remaining cases, a GC'd data structure wouldn't cut it, either, so one would likely need to drop down to unsafe anyway to squeeze out the last few percents of performance, and imposing a whole different paradigm upon the language for that purpose is not justified nearly enough by it.

6 Likes

This appears to be related to lock-free concurrent algorithms, and is a basically a reference counting scheme. It is not a tracing-type garbage collection scheme, where a garbage collector runs, marks everything that is reachable, and then frees everything else.

I doubt there will ever be tracing garbage collection in Rust, it doesn't make much sense.

Oh yeah. Some immediate thought on the idea in no particular order:

  1. Adding GC to Rust would make it a very different language. All of a sudden it becomes Javascript (well, Typescript but with stronger type rules). I strongly suspect that would not play well with all the existing Rust code. Basically it would make Rust "just another new language" like many others. It would remove a large part of the reason for Rust to exist at all.

  2. Any and every algorithm is implementable in C which does not have GC, the programmer has to take care of dynamic memory allocation with malloc() and free(). Rust can be used to do anything that can be done in C, all be it with the use of unsafe on occasion. Ergo I conclude Rust does not need GC.

  3. Adding GC to Rust requires that it become dependent on a run-time system to support it. That has implications for performance and starts to rule it out for small micro-controller systems.

  4. I'm not convinced of the "low pauses" of GO. It's a while since I tried out GO but my observation was that it suffered from terrible jitters when I tried to get it to do what our application demanded. In fact Javascript under node.js had more consistent execution timing so we went with that. Perhaps GOs GC has improved in the years since then, I don't know.

  5. In my career I have almost never had to implement really complex algorithms and data structures. I'm quite happy to leave that to those smarter than me that know what they are doing. That is why we have high level languages and libraries, right?

  6. Perhaps, maybe, some kind of GC can be provided for Rust programs implemented in libraries(crates) rather than being built into the language. I have no idea how well that would work or how useful it would be.

All in all I'm here in a large part because Rust does not have GC and the accompanying run time system. If I get lazy and want to write throw away code quickly there are many languages with GC to choose from. No need to impose that burden on Rust.

3 Likes

That's the biggest issues with tracing GC. It violates any and all encapsulation principles. To make tracing GC feasible it needs to know about any and all pointers which connect different objects in a program. And that means, for example, that such pointers can not exist on a different node or in database, they all have to be in memory just for the GC to work.

Half-century ago when there was a vision of one central GC in a system and things like iAPX432 and Lisp machines were developed is made sense: the plan was to make GC “a part of hardware architecture” (not something an application developer would think about), switch to fifth generation languages and stop caring about low-level details.

Today… we already know it doesn't work: nobody uses 5GL language (because they don't really exist) and it's time to forget about tracing GCs, too.

Yes, I know, it's hard to abandon billions of dollars of investment spent for these things… but they don't work!

They are not needed for safety (Rust proves it) and they don't save from memory leaks (definition of memory leak used by tracing GC proponents to prove that tracing GC doesn 't leave memory leaks while Arc leaves them is pretty much pointless: if my C++ programs leaks memory because I forgot to call free and my Java language leaks memory because there are live references left is some auxiliary data structure the end result is indistinguishable from end user POV).

Sadly Apple was the only one who admitted and accepted that defeat, everyone else still drags that carcass of a dead horse.

Rust and Swift have shown that you don't need to spend countless billions fighting GC, but can live with much less invasive ARC most of time (and move GC into separate processes without UI where it really makes sense).

Can we, please stop beating this dead horse? Paying $1 each day just to get $100 saving once a year is not a win!

7 Likes

Wow, the iAPX_432, don't see that getting mentioned much.

Round about the time the 432 was to hit the market I spent hours studying the Intel 432 data book. Being a youngster who had been weened on BASIC and Algol and spent a couple of years programming in the 8 bit assembler of those new fangled micro-processors, 8085, 6800, 6809..., I could not make heads nor tails of it. It was so gigantically complicated I could not even see how to get it to boot up and blink a LED. Little did I know the whole thing was predicated on object oriented programming, virtual and protected memory, garbage collection, all of which I had never heard of.

I concluded it was so complex no-one was ever going to be able to use it and it would be a big flop.

Anyway, my take on things now a days is that there is only one reason for GC. That is that the programmer or programmers on a large project have forgotten all the places in their code where objects may be used or when they may be used. Or they never knew in the first place, as might be the case of a new team member confronted with a huge code base.

So they lean on GC to ensure things don't explode when those forgotten objects finally get used somewhere unknown at some time to be determined.

That is to say GC tries to save programmers from their lack of understanding of the code they are working on.

I have come to think of this as a bad thing. That the Rust approach of making one think through where all data is used and prevents one from "forgetting" about it is preferable.

1 Like

Automatic Reference Counting (in a Swift form where everything is reference-counted or in Rust form where you can use Rc/Arc in place of exclusive ownership) is enough for that.

Except it doesn't work for that, either. If you don't understand code enough to break cycles and allow ARC to remove all these useless objects then you are not even full step, but half-step away from leaving live references in your complex graph of interdependent objects — and then your program would run out of memory even with tracing GC.

I could have accepted GC as a win if I wouldn't have seen so many Java programs which use more and more memory in time and then, eventually, hit the resource limits and crash.

It's very really happens with C++ programs — these tend to crash on stale pointers instead. Rust is supposed to prevent that — and if it would actually happen in large (MLOCs) programs then it would become obvious that it's time for tracing GCs to die.

The scarce benefits you get from them are just not worth the complexity they bring to the whole ecosystem.

And I'm not talking about “stutter”. Have you ever tried to find the bug in a Go module for Android Unity-based game? I did. We managed to make it not crash on startup, at least. Which was already quite an achievement. And I'm far from being sure in the ability of the whole thing to survive few hours of playing.

World where GC is system-provided service… and noone rolls out their own GC… this world may make some sense.

But we don't live in such a world. In our world tracing GCs cause more pain than they provide benefits. Paying $1 each day just to get $100 saving once a year is not a win!

I'm not quite sure I know what you mean by that.

If my language, say Javascript, takes care of all memory allocation and garbage collection, then what difference does it make to me if that memory management is done by the language run-time in my process or some global system wide system. Surely I would not see the difference.

Such a system provided service implies to me that the memory management is provided by some operating system kernel. That is to say I could not use the language without adopting some huge and complex run-time/OS underneath it. Which is the opposite of what I want. Also wouldn't we need a systems programming language to implement that run time. Rust say :slight_smile:

Or it implies that the memory management/garbage collection is pushed down to hardware. Which brings us back to the 432 I guess.

Meanwhile I was intrigued by someone who has attempted to do the opposite. They want to use the safety of the Rust language to implement all memory allocations, protected memory, process isolation, in software. Basically to have the memory usage checked at compile time rather than by the hardware at run time. Seems they have a such a kernel working already. Sorry I don't recall who it was and can't find a link to their work just now.

As long as that's new program for a new platform you wouldn't see a difference. But sooner or later you would need to integrate module which uses it's own GC, then third, fourth… and since every GC is designed with the assumption that there would only be a single GC in the process… you would have quite a nice fun trying to make sure the whole thing is capable of even starting your app without crashing.

Welcome to the wonderful world of IBM i. The sole survivor of the era where various designers attempted to create systems where you can no write and run program in machine code and have to rely on a single runtime provided by the operation system.

Not necessarily in hardware. You may just design your OS without the ability to run native code programs. Like Java ME or Firefox OS.

It may even work if you would manage to capture large enough market share before alternative with an easy access to the machine code programs would be released. But I think only IBM i survived (apart from some rare niche players).

So, note that one reason that D did not become popular -- whether this was justified or not -- was that it came with an "optional" GC. They could say it was optional as many times as they wanted, but to a huge number of developers it was always just "well it has a GC, so it must not be appropriate for the targets I need to use". And, more foundationally, it's really not nice to have a gigantic ecosystem split between "needs GC" and "no GC".

That said, having some sort of GC<T> as an opt-in library thing (needing proc macro derives and such on relevant types) could be interesting for certain applications, especially where things are embedded in a system that runs its own GC. You might be interested in boats's experiments in Shifgrethor I: Garbage collection as a Rust library - Without boats, dreams dry up.

(Opt-in is important here, because being able to trace through all objects is a peanut-buttered cost on everything, which violates the "you don't pay for what you don't use" goal. See why a huge number of C++ projects disable RTTI, for example.)

4 Likes

Optional GC just never works. Theoretically speaking D had GC optional, but practically speaking a lot of libraries (including what was perceived as the standard library) required it. And to use even one, single, library which uses GC you needed to make sure you whole program paid the price.

This means GC is not, practically, speaking, optional.

Except this may never work because when people moan for lack of GC what they really want is not a GC which would help resolving complex memory management issues, they want something which would allow them forget about memory management completely (at least for some time, till they would be forced to debug their app to find out why form with two knobs requires 10GiB of RAM).

From the very link you have shared:

In the long term, shifgrethor may lead to a usable, performant garbage collector in Rust. However, the API I am presenting here is not simpler or more convenient to use than Rust’s normal APIs, for a variety of reasons. Its unlikely, then, that this project will result in garbage collection becoming a commonly used feature by Rust projects. That is, to be emphatic, I do not expect we will ever “add GC to Rust” as it exists in other languages, or that we will recommend people use it as an “escape hatch” around the borrow checker.

1 Like

Since my first post, I have found this summary post of the Current State of GC in Rust and reasons for having one from Manish Goregaokar, very interesting.

A Tour of Safe Tracing GC Designs in Rust
https://manishearth.github.io/blog/2021/04/05/a-tour-of-safe-tracing-gc-designs-in-rust/

Thank you all for the very interesting discussion.

Best regards,
Joao

1 Like

That's really nice blog post, but it, again, highlights the biggest problem of tracing GC's: they are infectious. To much worse degree than async or anything else you may imagine.

Not only GC demands permeate all the code generated by JS or C# JIT-compiler, but if you want to extend code written in these languages with something written in C++ (or Rust) then you need to deal with them by adding lots of manual helper code (although in some cases you may teach your compiler to spread poison around).

If that choice was already done by your predecessors (you are writing plugin for Node.JS app or you are adding something C#-based Windows App or do something like this) then you really have no choice but to bite the bullet and teach Rust to live with GC.

This is still easier than teach C# and Go garbage collectors like each other, believe me.

Yet even that is not something you may want to add to you program if you have a choice (note how all these articles like to say that you may want to use Gc for “complex algorithms”, but when you read till the end no such examples are ever presented… it's always about interaction with GC-based languages).

Basically: if you want to somehow reconcile Rust and other, GC-based language, they sure, it can be done (and there are many ways to do that)… but if you hope to add some magic pixie dust into your tiny module which would save you from demands of the borrow-checker… not gonna happen: it would require more work to add GC than to convince borrow checker to accept your code.

2 Likes

I think a language like Rust, but with a GC (and autoboxing, etc.) to simplify some borrow-checking pains would be great.

However, that's not for Rust. Rust has specifically tackled memory safety without a GC. It's the headline feature. Having a GC would make this message unclear, and users would be suspicious how much of Rust and its ecosystem actually works without a GC.

The D language has an optional GC, and has this difficulty of convincing no-GC users that disabling GC is not a gimmick.

4 Likes

I'm not sure how that may work. Properties which borrow checker exist to ensure are important. And there are really only two choices:

  1. Either you give the software developer a way to turn off borrow checker in some part of code and then ask him (or her) to ensure that these properties are preserved manually (that's what Rusts's unsafe does).
  2. Or you add autoboxing and GC to the language and ensure that everything still works if properties which borrow checker was supposed to enforce are violated — and then you don't need neither ownership system nor a borrow checker.

IOW: I stronglhy suspect that an attempt to create such language would either end up as Rust (maybe with a different syntax) or Java/C#/VisualFred (maybe with a different syntax).

It's possible that sensible middle ground exist, but I, for one, wouldn't be able to even imagine what that would be? Resource types vs data types? And how would you handle references from data types to resource types? And believe me, developers would want such references (or, rather, developers who are diligent enough to work with Rust directly without any need for a “smaller Rust” and developers who couldn't make borrow checker their friends wouldn't be able to understand difference between “resources” and “data”).

I'm not saying that any such endeavor is guaranteed failure. Who knows, maybe there is a working “middle ground”… but… well… I'm really, really skeptical about this idea.

We already have the chaos which async code vs sync code in Rust: What Color is Your Function? – journal.stuffwithstuff.com

Which already takes some thinking about to glue together.

It's on the verge of using two different languages at the same time.

I can imagine adding garbage collected vs non-garbage collected code turns an already complex 2 dimensional problem into an impossible to deal with 4 dimensional problem.

My vote is no. Let's not even think about it.

1 Like

I think this is a common overstatement of the capabilities of a GC. As I learned in C++, the magic of RAII is that it works for all those resources that aren't memory too. And even complicated GCs with finalizers and such aren't as good as RAII when it comes to cleaning up all the other kinds of handle things.

See, for example, how C# added in parameters despite having GC, because borrow-checker-like constraints on what functions are allowed to do with things are useful in general. (But those are just a parameter mode, not a type, so are substantially less powerful than &s in Rust.) And for ownership, see how things like GZipStream's constructor take a bool leaveOpen because the language only has shared mutable ownership of stuff, and thus you need extra-lingual things to keep track of what needs to be closed.

So I'd love to see a Rust-inspired language targeting that same Java/Go/C# area of the performance space (the "I don't need C++, but I do need better than python" part). That language would very much not be Rust, though.

1 Like

It very well might happen via a crate. People do things just because they think they should.

Have you read the discussion up to this point? Yes, there are already GC-related crates. No, they don't fulfill the whine “can I turf off that @%#$^& borrow checker, just write code and then start debugging it”.

Because for the tracing GC to make any sense it have to be part of the language and permeate everything in that language.

Otherwise it's mostly good for interaction with GC-based languages.

Not even. It's not about capabilities of GC. Absolutely not. It's about capabilities of human.

Remember C.A.R. Hoare:

There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies.

If you follow the first approach then you don't need a GC because RAII is enough and even in Rust you rarely fight borrow checker.

If you follow the second approach then you have no need for ownership and borrow checker because you wouldn't use them anyway, but would repeat “close issue, send to QA, receive few more issues in Jira” till your creation would pass the QA and you don't really want or need to fix all the bugs.

But how often these are used by real programmers in real code? My guess would be that maybe 10% of programmers use these and then only because they couldn't just abandon C#.

As I've said: it may be an interesting experiment but I don't think “speed” is even the right measure there. I would rather think about correctness scale:

  • First you have Perl/PHP/Python/Ruby/etc — languages where you, basically, have no idea if code makes any sense before you try to execute it. And language even, helpfully, tries to make some sense from incorrect program! Which means you never even know if your program have a bug even if you covered it with test!
  • Then you have C#/Java/Go — languages where some mistakes are caught before you run the program, but many will survive till you tun tests and plenty would be found by the end user. Still hard to write correct code but most of the time mistakes lead to crash and not to CVE.
  • On next stage it's Haskell/Rust/etc — languages which have the reputation “if it compiles it works”. And unless you actively try to add bugs into your program with the use of certain unsound tools it actually works… but still panics on bad input from time to time.
  • Still more strict are Agda/Idris/etc — with formal proof of correctness, etc.

And before Rust it was “common knowledge” that “for safety” you need a GC. Preferably tracing one. In a hindsight it's obvious, now, that it was always a lie: yes, you need some help from the compiler but reference counting is enough most of the time and tracing GC is more of a hindrance than help if you want predictability, but for about two decades all efforts were centered around the need to improve GC, not eliminate it.

Rust made it clear that you don't need tracing GC even for third category of languages, but most GC lovers like languages from the first category!

And I wonder if anyone would even attempt to make something more strict than Rust but without GC. Not limited like Google Wuffs, but more general-purpose. That can interesting.

GC-based language with ownership? Can be done, but who would use it? And what for?

1 Like

I seem to remember someone saying there should be a language that's like Rust, but with GC and only compiles to WASM, but I can't remember who said that or where