Lifetimes Across Threads

I was recently musing about lifetimes in relation to threads and was hit thought:

  • The very concept of lifetime implies a singular linear progression of time, but having more than one thread makes that assumption false

Then I came across this:

" A seasoned Rust developer will respond by saying that Rust gives us simple tools for dynamic lifetimes spanning multiple threads. We call them “atomic reference counts”, or Arc. While it’s true that they solve the immediate problem—borrows check and our code compiles—they’re far from a silver bullet. Used pervasively, Arc gives you the world’s worst garbage collector. Like a GC, the lifetime of objects and the resources they represent (memory, files, sockets) is unknowable. But you take this loss without the wins you’d get from an actual GC!"

Does he have a point? I don't think so. The biggest problem with GC is the massive uncertainty in latency it introduces. But Arc will never cause such problems.

You don't get the wins of a GC with Arc - not having to manage lifetimes and ownership - but you don't take the losses either.

Thoughts?

Clearly, the author has never used a conservative tracing GC. Or, as I like to remember it: leaking with extra steps.

6 Likes

It's exactly knowable, assuming a solid ownership model is part of the design, even in the face of threads. There is a problem though; some devs are so used to spaghetti references that they struggle with developing models with clear ownership.

8 Likes

That's just what I thought. It's not known at compile time but is at runtime, meaning deallocation can be done as soon as refs drop to zero, which means there is no need for stop-the-world all-thread GC pauses. That is THE weakness of GC, as it makes mean latency explode.

I suspect the comment refers to the fact that in a language like Java you typically have lots of short-lived objects, in which case it usually works out to be less overhead to just mark and sweep the objects that do NOT need to be collected. For example, if you do a million heap allocations, and when you go to collect only a thousand are still reachable, then it's almost surely going to be faster to just move the thousand to a new space and reclaim the old space than it would be to individually deallocate the 999,000 bits of garbage. And because you start with a fresh heap space after each GC, allocation is generally just tail allocation from a thread-local buffer, which is wicked fast.

But in Rust, typically most data is on the stack or just in registers. And you can often avoid individually deallocating lots of little heap-based structs using arenas. And an Rc/Arc can hold a tree of structs rather than just one. So you are not typically allocating and freeing millions of Rc- and Arc-held objects per second. I imagine it's true that if you did put every little struct behind an Arc, the result would be quite slow, but it's unrealistic to think that anyone would do that.

And yes, there are other disadvantages to mark and sweep garbage collection, depending on how you do it, like pauses that cause latency, destructors that run later than you would like, large memory requirements, and the struggle of getting GC tuned acceptably. And if you have lots of long-lived objects (as with a big cache), marking them all can be expensive.

3 Likes

Reference counting is a garbage collector. The term "GC" often implies tracing garbage collectors, ignoring the similarities. They are two sides of the same coin. More on this topic in A Unified Theory of Garbage Collection (umich.edu) This paper gives a list of pros/cons between them.

Sweeping at non-deterministic points can lead to non-deterministic latency, yes. Arc just makes your latency deterministic. Ref counting might be zero-cost from the point of view that atomic reference counting probably can't be done more efficiently, but it has a higher cost than not reference counting in the first place. [1]

It is not always possible to avoid ref counting. When you absolutely must have shared ownership, Arc<T> is the tool to use. My interpretation of the author's conclusion is that it is overused when there's no good reason for it.


  1. One way to avoid ref counting is to move ownership instead of sharing ownership. Move data into a thread, do something with data, move data out of the thread. That can be done with join handles or channels. Moving doesn't have to be expensive. Moving String and Box<T> won't physically copy the string bytes or T, for instance. Only the string's stack-allocated pointer-length-capacity tuple (or just the pointer in case of Box) is moved. "Share memory by communicating, don't communicate by sharing memory." ↩︎

6 Likes

Thanks for the replies. I couple things.

No one commented on this assertion (of mine):

  • The very concept of lifetime implies a singular linear progression of time, but having more than one thread makes that assumption false

I guess that's pretty much undeniable, right?

Secondly, what of the author's main assertion, that async/await is a bad paradigm in Rust, specifically because of the lifetime issue? Agree or disagree? I get what he's saying - the viral nature of async makes the issue of dealing with lifetimes across threads more pervasive than other concurrency solutions.

I'm starting a new private project in Rust (a real-time trading system) and trying to decide on the concurrency mechanism I want to use.

Here are some thoughts of mine:

  • Async/await is handy mainly for the use-case of needing many more active tasks than threads - it's great for things like high-connection web servers, where you want to multiplex tasks to threads. This system won't really be doing that.

  • Without such requirements, threads with channels may be a good, simple solution.

  • I also plan to investigate Rust actor libs. The actor model can work well for coarse-grained services that need to protect shared data.

Appreciate feedback.

If you take "linear" literally, then it's true since max(t1, t2) isn't a linear function. But then, neither is a single-thread n log n lookup.

If you take "linear" to mean "predictable", then I disagree.

1 Like

As long as you have a syncronization point between the ends of the borrows and the Drop of the data, it doesn't matter how non-linear the threading is in-between - this is how e.g. rayon works.

2 Likes

If you create/clone and delete a lot Arcs at a high rate, then that's a huge overhead (Hello, Python!).

On the other hand, if you have just one piece of data, put that behind an Arc, give each thread a clone and leave it like that for a while, then the overhead is minimal.

Also, avoid cycles of Arcs, because that can become a memory leak. But just to share data between threads, you shouldn't need one Arc inside another.

4 Likes

No, this is false. In multithreaded situations time is not a linear flow (the C11 memory model borrowed wholesale into rust for atomics specification is particularly causality-bending), but there is a partial order on events called "happens-before", with the property that on a single thread this is a linear order and with multiple threads this is basically multiple lines with occasional crossings for synchronization events like reading an atomic variable that was written by another thread.

Lifetimes are a compile-time construct, so they don't have exact representation in the operational semantics, but they essentially correspond to the drop call when one exists. When drop occurs, there can only be one owner of the value and all references to it are gone, so the drop happens-after all uses of the value. In other words, even though events are partially ordered, all events involving the object are ordered before the end-of-lifetime event for the object, even though this is not a synchronization point (thanks to the magic of compilers and static analysis).

Things are actually more interesting for objects that do not have a drop, because in this case there is no actual code being run and no specific event to point to as the end-of-lifetime event for the object. Instead, you have some uses, and then at some point you don't have any uses anymore. What you can say in this case is that any event in which the memory has been repurposed for something else must happen-after an event where the memory contains the object. So it's not so much a specific point where everything converges to, but rather a cut through the event graph separating things that are conceptually before the end-of-lifetime and those after it, such that there are no happens-before edges going back in time across the cut.

As a more conceptual picture, every thread has its own idea about when object X died, although for threads that didn't synchronize with the owning thread you can draw that point mostly arbitrarily (there are many valid ways to draw the line), subject to the constraints provided by synchronizations with the owning thread (which give you a reference point for whether X's death has happened yet) or transitive synchronizations with threads that talked to the owning thread and so on.

5 Likes

Great response, thanks.

I got taken in by the spirit of this article, which I now see as - erroneous.

"... unlike other languages with userspace concurrency, Rust tries to offer this abstraction while also promising the programmer total low-level control.

There’s a fundamental tension between the two, and the poor async Rust programmer is perpetually caught in the middle, torn between the language’s design goals and the massively-concurrent world they’re trying to build."

There's no such tension, or not much, in light of the partial ordering of threads.

1 Like

I wonder what that passage is trying to say. It seems to be invoking notions of Einstein's relativity. Were different observers can have different notions of when an event happens. Of different creation and destruction events that constitute a Rust variable's lifetime.

Well, aliens on another planet far across the universe likely have a lifetime, like us humans and all things. Potentially with a powerful enough telescope I could see them running around their planet. I might think to send one of them a message. It's hopeless of course, likely they died a few billion years before I ever knew they existed.

Surely that is not what happens in Rust. You are never going to get a reference to a far away thing on another planet (thread). That thing might come an visit you in his space ship (Arc or whatever)

And what is that about? I cannot see Arc as a garbage collector at all.

A garbage collector is the guy who runs around picking up all the crap they have dropped in the street or park. Or code in a language run-time that runs around picking up all that memory you have casually forgotten about. One does not know when this garbage collection happens or what disturbance it might cause.

Conversely with Arc there is no garbage. There is no garbage collector. There is a secure delivery service that couriers stuff from place to place in an orderly manner. The stuff is destroyed as soon as it has no place else to go. It does not get thrown in the trash for a garbage collector to find sometime later.

Of course the lifetime of things in an Arc is known. You write it yourself, there in your program.

I don't get it.

This is a detail that is usually lower-level than anything we have to deal with as software engineers, but it's absolutely true that relativistic effects are at play in a modern computer - the propagation of electrical signals happens at the speed of light. Take a cpu at 5GHz - each cycle takes 0.2ns. The speed of light limits the propagation of information in that time period to 6cm. Especially in the case of a multi-socket server, it's easy to imagine two CPU cores being more than 6cm away from each other - the CPU cores perspective of each other (at the timescale they operate at) is not dissimilar to our perspective (at our timescale) looking through a telescope at another planet.

1 Like

I'm sure that is true.

The speed of light is one thing. But there is all those layers of cache, memory busses etc to deal with as well. These things have been timing issues long before we hit such high speeds.

As long as the chip and system designers are syncing things up when required we don't have to think about it much.

I don't see it has an impact on "lifetimes" in the Rust meaning of the term.

The title seems like clickbait but most of the points made in the article seem straightforwardly true and informed. I do think that probably there is some possible type system that's even better than Rust's in terms of being able to allow asynchronous task structures to be created safely in a way that's extremely sheer (CPU efficient). I don't think these limitations of Rust are due to any severe systemic problem in Rust's development but simply the fact that this is currently where we are in terms of the technological progression of programming languages and not further. As an analogous example, Rust's type system's inability to elegantly capture the fact that the moving of a box shouldn't actually invalidate references to the content within the box is a known limitation with work being done on it but no excellent solution found and implemented yet. I wouldn't be surprised if in 40 years the hip youth invent a spiritual "Rust 2" that uses Rust's greatness as technological baseline assumption and builds on it by figuring out a way to be fundamentally better at expressing some of these things or at least something analogous. That said I don't really understand how this is as much of a practical problem as the article seems to express--Arc isn't that costly as they might be thinking I feel, and parallel iterators like rayon are pretty good at making high-throughput systems.

The bit about async Rust providing an uncompromising low-level implementation is certainly true. I don't know where the quote originates (It's quoted from the article in OP), but I think it just assumes [1] that async Rust should be used by everybody (hereto called "poor async Rust programmers"), and I disagree with that basic assumption.

In my opinion, async-by-default is a mindset that will just get you into trouble. Not everybody is building massively-concurrent applications. It's ok to just use 10,000 threads if that's all you really need, especially if it avoids lifetime issues or other common pitfalls that might crop up with async. The kinds of issues that the article in OP is critical about.

This is why projects like ureq exist for HTTP client requests, and the sans-io [2] pattern for protocol development.

In other words, if async is not a good fit for your project, you don't have to use it.

I don't think async was ever intended to be a more ergonomic threading. It's intended to be a more ergonomic cooperative concurrency primitive. It sounds like your project doesn't need cooperative concurrency at all, so I wouldn't even bother starting with async when approaching a similar project.

Threads and channels are more than adequate for most projects, but it doesn't by itself solve lifetime issues. Those can be avoided by moving ownership (I mentioned this in a footnote on my last reply).

Actor frameworks take the basic concept of thread-with-channels (see communicating sequential processes) and extends it to include supervisors, dynamic actor pools, compositional APIs, and other niceties that also may not be necessary for most applications. I wouldn't say "don't even attempt it", but I would recommend KISS [3].


  1. If not an outright assumption, it is certainly implied. ↩︎

  2. This site is Python-focused, but they have the same sync/async bifurcation that most languages have been acquiring in recent years. ↩︎

  3. No surprise, but async-by-default blatantly ignores this fundamental principle. ↩︎

3 Likes

Who are these people suggesting async by default?

Why do we feel the need for threads at all? As far as I can tell there are few reasons:

  1. We would like our application to be somewhat responsive whilst it would otherwise be "stuck" waiting on something that blocks. Waiting for input from a network or whatever other input. Basically we have the problem in untreated code that looks like this:
do forever 
    read some blocking input

    do something with read data

    write some output
od

Where, Oh, but I would like to do something else as well:

do forever 
    read some_other blocking input

    do something_else with read data

    write some output
od

Threads gives us means to do both these loops "at the same time".

  1. Performance. When we have a lot of work to do and many cores available threads allow us to split the job up and spread the work over the cores and get a performance boost.

  2. I can't think of a 3).

async threads are great for 1). You can thousands, millions of such looks all waiting on their respective inputs. Not so good for 2) If you have a long running calculation to do it will hang up the entire async run-time.

Synchronous threads are great for 2) A thread can be spun up on a core and spend as long as it like crunching without hanging up other threads on other core.

In short async is great for doing a lot of waiting, sync is great for doing a lot of work.

Of course we can mix and match async and sync threads with run times like tokio.

I don't really see anyone making the suggestion per se. It's more that it's an unspoken assumption or implication. When given a choice between sync or async, someone may default to async either because it's shiny and new or because they have grandiose visions of a massively concurrent GUI [1].


  1. On the one hand, I see the connection between tasks that can cooperatively wait and something like "well a button is just waiting to be pressed". But on the other hand, the number of caveats and the complexity cost just don't seem worthwhile. This is another example of violating the KISS principle with async. ↩︎

To be fair it is unfortunate that there is an ecosystem split where crates are generally either async or not and there's not an easy way to develop it for both simultaneously without duplicated work