Aliased Xor Mutable core for a high-level language

How do you come from "Reference Cycles Can Leak Memory" to

Good question :slight_smile:

Actually my feeling is, that Rust currently has no support to automatically break cycles, so I strongly assume that cycles actually causes leaks. Or in other words, I assume that data structures directly referencing each other are not freed automatically when they could be freed theoretically, as they are not any longer actually used. I read a few Rust books already, and such an automatically cycle breaking support was not mentioned.

uhm... it's mentioned in Reference Cycles Can Leak Memory - The Rust Programming Language ... using Weak<T>.

Or you handle the cycles properly

Remove the Drop implementation for List and you'll get leaks.

I'm rarely using Rc, but my preference would be using Weak<T>. I'm wondering, whether a lint checking for Rc::clone(...) and suggesting Rc::downgrade(...) would make sense.

1 Like

This.

GC just pretends these problems don't exist, which makes everything worse.

Take, for example, this StreamWriter constructor in C#:

leaveOpen Boolean
true to leave the stream open after the StreamWriter object is disposed; otherwise, false.

That's just manual management again, because it doesn't have a type system capable of dealing with the distinction.

So what happens? You get the things that are "use after free" bugs in spirit -- albeit at least not UB ones -- where the writer closed the underlying stream when you didn't expect it to (quick, do you remember which wrappers do and done close it? do all the libraries you use exhaustively document that?) and thus you get an ObjectDisposedException if you're lucky or weird behaviour where sometimes things just don't work if you're unlucky.

Similarly, GC languages are finally admitting that no, "the GC will just handle it" doesn't actually work, so adding official things for things like array pooling. But guess what, now you have different use-after-free-style weirdness because ArrayPool<T>.Return has zero language help to ensure you're not accidentally using that array after putting it back in the pool, leading to the same "wait, how did that get updated" when two different things are using the same storage -- just like a use-after-free in C.


GC only helps with memory in the most trivial way of keeping them alive potentially forever, and that's really not what I want. Even for memory, I'd much prefer that there was still a "free"-like call that I could make, so I can mark when I do know that I expect something to be no longer used, and have the language reliably keep me from using it after that point.

1 Like

I think the equivalent of freeing the memory for value types would just be let _ = the_value, right?

How often do you use something like that? Like, it would make many values harder to deal with, for a high-level programmer, but it’s useful to prevent copies of things like a User.

@CAD97, is this what you meant by using ownership for API design? Instead of allowing a User to be automatically cloned, it should be moved like other resource types?

I assume most types aren’t like that, in which case they should allow easy copies, and there would be a way for types to be declared as resource types, and those would move ownership.

resource type User {
    id: UserId,
    name: String,
    email: Email
    ...
}

pub fun main() {
    let db = ...
    let id = ...

    let user = unsafe_read(db, id)

    let userrr = change_name(user, get_input())

    // Error: `user` was moved into `userrr`
    user.name = "BadName"
}

The big problem with tracing GC (and what makes it less than ideal or even completely unsuitable in many domains) is that it makes your latency unpredictable. If you are doing server stuff that might just be an annoyance (making your p99 tail latency worse), but if you like me work in hard realtime it is a complete no-go.

Embedded systems with some sort of real time requirement vastly outnumber "normal" computers and applications. Most modern appliances/cars/gadgets has at least one microcontroller in them, and every single "normal" computer, phone, etc have several hidden cores / controllers that run real-time software. By that metric, GC languages are unsuitable for a majority of use cases. Yet, so few languages are targeting this domain: C, C++, Rust and perhaps Ada (but it is rarely used), that is about it.

Even on "normal" computers, GC is unsuitable for a huge amount of the software: OS, music production and games to just name a few. And you can see what happens when people ignore this: Minecraft in java would often stutter randomly back in the day when the GC decided to collect. Not hard realtime, but it made the experience unpleasant.

You can also see people going out of their way in GC languages to avoid building up too much garbage, which tends to result in unnatural looking code. I have seen this in Java and Ocaml. Meanwhile the natural way of writing code in non-GC languages like Rust or C++ tend to at least be reasonably performant (though perhaps not optimal).

My conclusion after decades of software development is that tracing GC is only suitable for non-essential scripts that don't have any particular performance requirements. Anything substantial should be written in languages with determinsitic memory management. I would love to see more scripting languages adopt determinsitic memory management. The only example I know of that comes even close is actually Python (since it uses reference counting, though alternative implementations like PyPy throws this out of the window to their detriment). Sadly, Python has so many other performance problems that this hardly matters.

It’s certainly a trade-off. I think Minecraft wouldn’t have come out if they couldn’t prototype it in a high-level language like Java. Many devs compare Rust and Go, for example, and they often pick Go and other high-level languages to make programs that actually fulfill their requirements.

The bigger issue, to me, is local reasoning. High-level imperative languages pass mutable references everywhere. This means function callers can never be sure if a function they call is going to mutate the variables passed in, so users must inspect the implementation or they must make defensive copies. Similarly, a function doesn’t know if it’s safe to mutate a parameter, so it must inspect all callers or make a defensive copy. Function callers will also need to check that updates to a function don’t affect them, which can be automated by tests at the cost of making the code harder to change.

Coincidentally, my proposal does lead to deterministic memory management, although the programmer has less control over it.