Controversial opinion: keeping `Rc` across `await` should not make future `!Send` by itself

In this reddit discussion I have observed that some people dogmatically believe that keeping Rc across await inevitably makes future !Send, thus making it impossible to migrate its execution across executor threads. It looks like they can not draw direct parallels between execution of threads and tasks, which become quite obvious for me.

My argument is that futures in theory could move between executor threads as freely as threads move between physical cores, even if they contain Rc or any other !Send data. To achieve that we need one simple thing: to enforce that Rc clones do not leave premise of task's state. And this thing is already enforced in APIs for task spawning, sending over channels, and others!

To demonstrate it, I present the following contrived example:

use std::rc::Rc;

pub struct Foo {
    a: Rc<u32>,
    b: Rc<u32>,
}

impl Foo {
    pub fn new(val: u32) -> Self {
        let t = Rc::new(val);
        Self {
            a: t.clone(),
            b: t,
        }
    }
    
    pub fn get_refs(&self) -> (&u32, &u32) {
        (&self.a, &self.b)
    }
}

unsafe impl Send for Foo {}

Note the last line. We implement Send for a type which contains Rc! Horror!

But think carefully about this code. How sending Foo into a different thread can cause UB (be it using spawn or channels)? We enforce that all copies of Rc are moved together and there is no way to move those outside of Foo and you can not clone Foo itself. Sending value to a different thread inevitable involves memory synchronization. Thus changes on Rc done in thread1 will be properly visible in thread2 to which Foo was sent.

I think it can work similarly with any !Send type, similarly to how we do not care that thread's stack can contain !Send values and move between physical cores. Well... with one caveat: TLS. Luckily the current thread_local! API is limited and should not cause any issues (after all, you can not yield inside closure passed to with). But potential stabilization of #[thread_local] and C libraries which do their own thing may cause issues. Another potential source of issues is reliance on invariants in private TLS, e.g. code put 42 into a private TLS and expects to find it there on next load, using assumption that no other code can access this TLS.

What do you think? Do you agree that example above is sound and that future could migrate to other executor threads even if it keeps Rc and similar types across await?

I do not say that all Futures should suddenly become Sendable, just pointing at the often unnecessarily strictness which only adds to the list of issues of the Rust async system.

What exact rule do you propose the compiler use to decide when to break !Send?

1 Like

I do not have a concrete proposal, just want to make sure that I haven't missed something and to attract attention to the issue. As I wrote in the linked discussion, to say it mildly, I am not a fan of the existing Rust async system and strongly hope that in future we will move to something else.

If you are fine with raw ideas, I think we could introduce a new unsafe trait, let's temporarily call it NoTlsShenenigans. It will relax properties of !Send types. If a future generated by compiler does not contain !Send + !NoTlsShenenigans types, then it can implement Send for the future even if it contains !Send types. For soundness we rely on the fact that we can access the future only through the Future trait, since it's an anonymous type created by compiler, so !Send types stay contained in the same execution context. Another assumptions is that executor does not "steal" future results (which may contain !Send value) and keep them in its thread.

See also:

https://blaz.is/blog/post/lets-pretend-that-task-equals-thread/

and

https://old.reddit.com/r/rust/comments/13p3rqi/what_if_we_pretended_that_a_task_thread/jl8xf2w/

I'd love to read a short, complete recap of status quo, potential changes here, and a set of litmus test programs --- I know that I don't know when async generated futures are Send/Sync, and it seems like there's something here.

1 Like

The problem with this is, as you correctly note about the Foo example, it's not about the types contained in the future, but what you do with them. To make Foo unsound it is sufficient to add fn get_a(&self) -> Rc<u32> (which would be a totally innocuous and normal thing to do if not for the unsafe impl Send). To make a Sendable Future containing Rc unsound it is sufficient for the future when polled to return or store an Rc "somewhere". (TLS isn't the only problem, either - what about the future's own Output?) That's not a property of Rc, it's a property of the future and its exact behavior when polled.

Answering the question "does this Rc get stored somewhere outside of this future" is a kind of escape analysis, and like most questions about the behavior of programs, it can only be answered for some programs, not all. (In jargon, it "reduces to the halting problem".) The Rust compiler doesn't do that kind of analysis... mostly. The main mechanism available to the compiler for proving that a function doesn't inappropriately return or store a thing in another thing is lifetimes. But Rc opts out of compiler tracking of lifetimes - from a certain perspective, that's the whole point of Rc. For the compiler to understand "don't let this reference escape this thing" you need to attach a lifetime to it - at which point, you might wonder, can I just do this with references anyway? And maybe that would be a still better answer.

Could the compiler do deep analysis on futures to see if Rcs escape? Sure, in principle. Should it though? Well, I don't know how generalizable it is, and it seems like a lot of work for dubious advantage.

7 Likes

I don't think it's that bad. For compiler generated futures we can safely assume that Future is the only way to work with this type (well, there is also stuff like Drop and Any, but it does not matter here). Thus we can not directly steal Rc stored inside this future.

Now, the problem is with return value of Future::poll, it's the only accessible escape place. As I wrote in the update to my previous comment, executor could steal !Send return result, send future to another thread, and an incorrectly implemented future may return Poll::Ready<T> where T: !Send on next polling. Thus we get UB.

The ideal way to prevent this would've been to deny at compile time future polling after it has returned Poll::Ready. But a practical solution would be to allow compiler to implement Send for compiler-generated futures only if the future does not contain !Send + !NoTlsShenenigans types AND returns Send type in Poll

You also have a problem if the Rc comes from the outside as an argument to the constructor: There's no way to know whether or not the caller kept their own copy of it.

9 Likes

But if you had:

pub fn new(val: Rc<u32>) -> Self {

then it could actually be unsafe, because there could be pre-existing Rc copies on this thread, and the Future holding Foo could be moved to another.

Or you could have any function in the same module or submodule that has access to Foo's private fields, and leaks the Rc that way.

So this creates a completely new notion of not leaking a value outside of a struct. It's may be a useful concept, but it's very easy to break it by accident.

Small code changes making Futures surprisingly !Send are annoying already. Adding this notion of sendability would make sendability of a future even fuzzier and possible to break at even greater distance.

It'd need some way to declare it can't be leaked outside of this instance:

pub struct Foo {
    priv(very very private) a: Rc<u32>,
    priv(very very private) b: Rc<u32>,
}
7 Likes

Some types are !Send because they require the drop handler to execute on a specific thread. E.g. Mutex's guard.

6 Likes

Ideally, the only way to get something "outside" would be through spawn or by sending it through primitives like channels (let's forget for now about special "spawn" variants which rely on childrens to be executed on the same thread). And those APIs would prevent it by using Send bound. We do it today for threads quite successfully.

But with the current (arguably flawed) async system it can work as well. Do not forget that I talk about implementing Send for compiler-generated futures, not for manually implemented ones. Compiler knows all arguments passed async fn or async block, if all those input arguments are Send, then your concern does not apply.

So to summarize, compiler may be able to implement Send for compiler-generated futures if:

  • All input arguments used for future creation are Send, i.e. the initial future state is Send.
  • Future returns a Send type in Poll::Ready
  • Types stored inside future do not rely on special TLS behaviour (tracked by NoTlsShenenigans)

I think the above text answers to you as well.

I believe my raw idea above makes compiler-generated futures less fragile, not more. With it chances of Send-related compilation breakage by seemingly irrelevant changes inside async fn become much smaller.

Such types would not implement the NoTlsShenenigans trait. Remember, the name is just a placeholder.

1 Like

I have had a similar thought previously, but I think getting this to work in practice will be quite difficult. For example, today you are always allowed to put values into thread-locals, since that doesn't let the value escape the current thread. However, in your case, that would be a problem since the task might afterwards get moved to a different worker thread.

A related thing --- another safe abstraction that's prevented by the current thread-local API are library-based stackful coroutines:

I wonder if this points towards some deficiency in thread-local API? And it does seem to me that thread locals are a principle issue here, intuitively, it seems like the following program:

#[derive(Clone, Default)]
struct Context {
    inner: std::rc::Rc<ContextInner>
}

impl Context {
    fn get(&self) -> u32 {
        self.inner.get()
    }
}

type ContextInner = std::cell::Cell<u32>;

async fn f() {
    let context = Context::default();
    g(&context).await;
}

async fn g(context: &Context) {
    let _ = context.get();
    h().await;
    let _ = context.get();
}

async fn h() {

}

#[tokio::main]
async fn main() {
    tokio::spawn(f()).await.unwrap();
}

should be able to execute successfully, the same way a blocking equivalent would just work.

1 Like

I touched this issue in the linked Reddit discussion:

I <...> believe that ideally we should distinguish between task/thread local variables and hart-local variables ("hart" is a RISC-V terminology meaning "hardware thread"). Things like rand::ThreadRng should be the latter, i.e. ideally you should not have more instances of ThreadRng in your program than number of CPU cores. But, unfortunately, we don't have critical user-space sections on most OSes (or using them correctly is a highly arcane, unportable thing), so two notions are usually combined into one.

In general, TLS is a significant hazard for async code (be it stackless or stackfull). To the point that I am tempted to proclaim that "TLS is considered harmful". Practically, with the current thread_local! API it should be sufficient to enforce that closure passed to with does not yield. With the current async system it's achieved automatically because with accepts non-async closure. But it's still possible to unintentionally get a logical bug, when you save something in private TLS, await, and expect to get the old value from TLS. Even if the future was not moved into a different executor thread, the TLS value may get overwritten by a different future on the same thread.

1 Like

My understanding is that its possible to get outright UB even without async closures:

thread_local! {
    static RC: std::rc::Rc<u32> = Default::default();
}

async fn sneaky() {
    let rc = RC.with(std::rc::Rc::clone); 
    async { }.await;
    std::rc::Rc::clone(&rc); // UB if we are on a different thread now
}
7 Likes

Hm, yes. This example indeed breaks my raw idea above.

UPD: I think it could've worked if return result of with was bounded by Send.

Note that going by the simple rule "an Rc in a parameter or output type makes the Future !Send" would make the g in your example !Send and thus f too with the current rules. This example shows why this requires non-local reasoning: you can no longer determine if a Future can be Send by looking at it alone, but it may be conditionally Send depending on how it's used by its caller and how it calls other Futures that are conditionally Send.

5 Likes

I was trying to implement a library based solution for Rc across await but has hit some weird
issue where rustc seem to overzealously require futures to be Send.

I introduced TaskSend auto-trait which is a weaker form of Send, added Trc which is Rc that
works across await, and a wrapper around tokio::spawn which requires futures to be TaskSend
instead of Send:

#![feature(auto_traits)]
use task_send::*;

mod task_send {
    use std::rc::Rc;
    use std::future::Future;
    use std::pin::Pin;
    use std::ops::Deref;
    use std::task::{Context, Poll};
    use tokio::task::JoinHandle;
    use pin_project::pin_project;

    pub unsafe auto trait TaskSend {}
    
    unsafe impl<T: Send> TaskSend for T {}
    
    // need to wrap Rc because we can't implement TaskSend for Rc otherwise
    #[derive(Clone)]
    pub struct Trc<T>(Rc<T>);
    
    impl<T> Trc<T> {
        pub fn new(value: T) -> Trc<T> {
            Trc(Rc::new(value))
        }
    }
    
    impl<T> Deref for Trc<T> {
        type Target = T;
    
        fn deref(&self) -> &T {
            &self.0
        }
    }
    
    unsafe impl<T> TaskSend for Trc<T> {}
    
    pub fn spawn<T>(future: T) -> JoinHandle<T::Output> where
        T: Future + TaskSend + 'static,
        T::Output: Send + 'static,
    {
        #[pin_project]
        struct F<T>(#[pin] T);
        
        unsafe impl<T> Send for F<T> {} 
        
        impl<T: Future> Future for F<T> {
            type Output = T::Output;
        
            fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
                self.project().0.poll(cx)
            }
        }
        
        tokio::spawn(F(future))
    }
}

#[tokio::main]
pub async fn main() {
    spawn(async {
        let a = Trc::new(5);
        let _: () = std::future::pending().await;
        let _b = a.clone();
    }).await.unwrap();
}

This fails with an error:

Compiling playground v0.0.1 (/playground)
error: future cannot be sent between threads safely
  --> src/lib.rs:61:5
   |
61 |     spawn(async {
   |     ^^^^^ future created by async block is not `Send`
   |
   = help: within `{async block@src/lib.rs:61:11: 65:6}`, the trait `Send` is not implemented for `task_send::Trc<i32>`
note: future is not `Send` as this value is used across an await
  --> src/lib.rs:63:44
   |
62 |         let a = Trc::new(5);
   |             - has type `task_send::Trc<i32>` which is not `Send`
63 |         let _: () = std::future::pending().await;
   |                                            ^^^^^ await occurs here, with `a` maybe used later
note: required by a bound in `task_send::spawn`
  --> src/lib.rs:39:21
   |
38 |     pub fn spawn<T>(future: T) -> JoinHandle<T::Output> where
   |            ----- required by a bound in this function
39 |         T: Future + TaskSend + 'static,
   |                     ^^^^^^^^ required by this bound in `spawn`

Rust compiler somehow thinks that my spawn requres Send instead of TaskSend. If I remove
unsafe impl<T: Send> TaskSend for T {} it compiles. How could adding an impl cause a compilation
error? impl<T: Send> TaskSend for T just adds TaskSend impl for more types, it should cause less
compilation errors, not more.

Another issue is that Rust doesn't allow overlapping impls of auto traits. This prevents me from
adding impl<T> TaskSend for Rc<T> as I get "conflicting implementations" error. Since auto traits are empty I think it should be possible to have conflicting impls of them.

No, having a blanket implementation for auto traits overrides the default behavior, it doesn't just add to it.

For generic types (counting the built-in types above as generic over T), if a generic implementation is available, then the compiler does not automatically implement it for types that could use the implementation except that they do not meet the requisite trait bounds. For instance, the standard library implements Send for all &T where T is Sync; this means that the compiler will not implement Send for &T if T is Send but not Sync.

That's marker_trait_attr, but it doesn't help the example because your TaskSend bound is on the future (and not Trc or Rc), so it's still falling back to the blanket implementation.

3 Likes

My attempted solution is wrong anyway as it doesn't prevent the thread_local issue. Adding additional bounds to LocalKey seems to be the only way.