Some questions about threadpools and scope boundaries

When using a ThreadPool to execute an asynchronous task (e.g., using Tokio's), how does an inner async fn self-determine when it's being executed-upon by a different thread than another async task?

I want to know when I actually need to implement Send + Sync for various types by necessity, and not out of "just in-case this type crosses thread-boundaries" sort of cautious-thinking.

Don't thread pools push tasks onto a parallel event-loop once the primary async event-loop becomes overloaded with tasks? If this is done, then it would seem necessary that all types that are used within a ThreadPool would require Send + Sync.

Whether there is a Send bound on the futures you want to spawn depends on the runtime used to spawn them:

vs.


So it all depends on whether an executor may run the task on a different thread. Just to be able to soundly do that, a Send bound is required (note that it does not require Sync, as at most one thread at a time handles the task; it is (potentially) moved into another thread, not shared, hence why Sync is not needed).

Only when dealing with a runtime guaranteed to be single-threaded does the API drop the Send bound.

2 Likes

That makes sense to me Yandros. A side question, but I've noticed thus far that there are sometimes F: 'static's on the definitions of functions. Are all explicitly defined closures implicitly 'static since the closure has an inherent underlying function pointer that lives for the entire runtime duration? Or, is lifetime coercion happening here on the definition of F?

I can imagine several scenarios wherein bounding the lifetime of closures can be very useful.

For some lifetime 'a and some type Struct, saying that Struct : 'a means that one can use any value x: Struct during the lifetime 'a and be sure there will be no dangling references (we say that the type Struct is valid for at least the lifetime 'a). That's what the bound is used for.

In practice, the bound is met by recursively checking that the type of each field is valid at least for the lifetime 'a (this is an obviously necessary condition, and it turns out that it is also sufficient).

Examples

  • Primitive / non-generic types are 'static

    • if they do not borrow anything, then they can never become dangling,

      • For instance, T = i32, T = String, T = ()
    • if they do borrow something, then for the borrow not to be generic it must use the only "literal lifetime", i.e., the 'static lifetime;

      • For instance, T = &'static str, T = &'static mut i32
  • A generic type F<'a1, 'a2, T1, T2> is valid at least for the lifetime 'a if and only if so are all its generic parameters: 'a1 : 'a, 'a2 : 'a, T1 : 'a and T2 : 'a.

The 'static lifetime

... is a lifetime that never* ends. Simple as that.

* As long as the program itself is running, obviously

Note that lifetimes have no "start" point requirement, they only care about "end of life" (because it is only after that point that dangling pointers may occur: you cannot have a pointer to something that has not started existing yet!).

So, for instance, you don't need to be borrowing (from) a static variable to get the 'static lifetime. As long as something is no longer to be freed, it can be borrowed for the 'static lifetime. This is "obviously" not possible for stack-allocated memory, as that memory is freed when the function returns, which it may very well do (theoretical counter example being a never returning / diverging function that does not panic!, so it either loops indefinitely or aborts the whole process); but it is possible for manually (heap-)allocated memory: by doing Box::leak(Box::new(value)) one gets a &'static mut T, provided x: T and T : 'static (obviously if T = &'small i32 you cannot have a &'static &'small i32, and more generally you can only borrow for a lifetime 'a a value of a type T : 'a).

Closures, F, and F : 'static

Here is a short version.

having

let x: i32 = 42;
let closure_1 = move |arg: bool| -> String {
    println!("{}", x);
    "Hello, World!".to_string()
};
closure_1(true);
let closure_2 = |arg: bool| -> String {
    let at_x: &i32 = &x;
    println!("{}", *at_x);
    "Hello, World!".to_string()
};
closure_2(true);

is exactly equivalent to doing:

/// "equivalent" to the `FnOnce(bool) -> String` trait
trait Call {
    fn call (self, arg: bool) -> String
    ;
}

// Auto-generated by Rust:
#[derive(Clone, Copy)] // derived when possible
struct Closure1 {
    x: i32,
}
impl Call for Closure1 {
    fn call (self, arg: bool) -> String
    {
        // body of closure_1:
        println!("{}", self.x);
        "Hello, World!".to_string()
    }
}
#[derive(Clone, Copy)] // derived when possible
struct Closure2<'x> {
    x: &'x i32, // captured by reference
}
impl Call for Closure2<'_> {
    fn call (self, arg: bool) -> String
    {
        // body of closure_2:
        let at_x = self.x;
        println!("{}", *at_x);
        "Hello, World!".to_string()
    }
}

let x: i32 = 42;
let closure_1 = Closure1 { x }; // x moved (copied) into the closure environment
closure_1.call(); // i.e. <Closure1 as Call>::call(closure_1) => static / compile-time dispatch
let closure_2 = Closure2 { x: &x }; // x borrowed by the closure environment
closure_2.call(); // i.e. <Closure2 as Call>::call(closure_2) => static / compile-time dispatch

So, if we have then a function taking a closure as input, and calling it, the most idiomatic way to do it in Rust is with a generic type parameter bounded by our trait:
<F : Call>, (i.e., <F : FnOnce(bool) -> String>):

fn call_closure<F : Call> (f: F)
{
    let s: String = f.call(true);
    dbg!(&s);
}

call_closure(closure_1);
call_closure(closure_2);

then each call to call_closure is a version of the generic function monomoprhised by taking that auto-generated Struct as the type parameter F:

call_closure::</* F = */ Closure1>(closure_1);
call_closure::</* F = */ Closure2<'x>>(closure_2);

and so, if we had an additional F : 'static bound, Rust would be checking that:

  • Closure1 : 'static :white_check_mark: it is ok since struct Closure1 { _: i32 }

  • Closure2<'x> : 'static :grey_question: this is only true if and only if 'x : 'static i.e., if 'x = 'static i.e., if our original x is borrowed for a 'static lifetime, which requires that x cannot be freed. Since x is a function local / stack variable, this is not true, so the bound fails: :x:

    • if the x variable had been a static x: i32 = 42 or had been pointed to by a heap-allocated-and-leaked pointer
      let at_x: &'static i32 = Box::leak(Box::new(42));
      then we would have had F = Closure2<'static> : 'static and all would have been good.

In practice, the way to avoid borrowing is to take ownership, since by owning something we prevent it from being freed. That's why F : 'static closures often require the move keyword (c.f., the aforementioned blog post for more info).


Regarding "tasks" and F : Futures, it is the same: by having a runtime running multiple threads, the thread that spawns a task may return and die before the task is used, so the future it gave to the runtime has to be 'static to guard against such scenario (else there could be a use-after-free). That's why .and_then() and other combinator-based futures require move closures and thus owning pointers such as Arc rather than Rust borrowing references.

One of the main benefits of async/await is that an async function also gets magic auto-generated stuff from the compiler: a generator is created, that is, a potentially self-referential struct for the locals of the function: by containing and thus owning its whole "stack frame", a generator an async block / function avoids the use-after-free dangers (and has to pay, in exchange , the Pin<&mut Self> API cost).

2 Likes