Any way to remote kill an entire rayon threadpool?

Hey all,

I have work that happens inside a rayon threadpool on a separate thread, alongisde this work I send a heartbeat. If the hearbeat I send happens to return an Err, I would like to abort the work that is happening in that thread.

I can not check an atomic bool flag at the start of every iteration inside rayon because I'm not in control of the code that gets executed inside it.

Essentially the code looks like this:

std::thread::spawn(move || {
    rayon.install(|| {
        tokio.block_on(async {
            //...
        })
    });
});

Is there any way at all to tear down an entire rayon threadpool? while work is happening?

I was hoping I could inject an atomic bool check at the start of every rayon iteration invocation via a closure given to the threapool, like so...

let rayon = rayon::ThreadPoolBuilder::new()
    .start_handler(move |_| {
        // We must panic, because I see no other way of killing rayon
        // remotely.
        assert!(!rayon_cancel.load(Ordering::Relaxed));
    })
    .panic_handler(|_| {
        // This function exists to capture the panic from the
        // start_handler's assert. (Otherwise the panic will get
        // propogated to the main thread, causing the binary to exit).
        // There is Nothing to do in this closure, because we have
        // been remotely aborted, cleanup has happened externally.
    })
    .build()
    .unwrap();

But this doesn't appear to have the effect i want it to.

If you don't control the code that's running on the thread when you need to cancel it, there is no way to stop the thread unless the code gives you some other way to do that.

Based on the docs, I assume start_handler is only invoked once immediately after the thread is created, which is why it doesn't do what you wanted it to.

1 Like

There is no safe way to kill threads in general (e.g. if the thread you're killing holds a lock, the lock will stay locked, and all other threads using it will deadlock. There are locks in memory allocators, so you can easily hose the whole process).

You have to make them cooperate somehow.

3 Likes

I see often these exact words, what about the unsafe way or the not general case? I would like to do it! :cry:

there is a not very popular crate called stop-thread which, looks like it calls .into_pthread() from an ext trait defined here: std::os::unix::thread::JoinHandleExt and then apparently libc::pthread_cancel can cancel it (the crate also provides a way to do this with windows.

Perhaps I could use this to kill the outer thread?

If you look at the docs page on that crate, it links to the windows docs on TerminateThread which lists a bunch of ways using it can cause a whole bunch of problems. The pthreads version has similar considerations, though I imagine the details differ[1]

Unless you're very sure the code running in the thread doesn't have any of those problems, you'd probably be better off spawning a process to do the work since processes generally have better[2] cancellation mechanisms.


  1. the pthreads version is cooperative by default, which avoids all the problems mentioned in the windows ↩︎

  2. though still not great ↩︎

2 Likes

The fundamental problem here is that it's considered unsound to "leak" a thread in Rust, that is stopping it (and freeing the stack) without dropping local variables. The prime example of why this is unsound is std::thread::scope, which internal contains a local that when dropped blocks the thread until all the child threads have stopped; if that is leaked then the child threads can continue executing while borrowing data from the leaked thread. There are probably ways to solve this problem, but the solutions probably either still have the pitfalls of TerminateThread or require periodically checking some flag (which you wanted to avoid).

1 Like

Oh geez this is all very interesting. Thank you!

So the reason i want to kill the thread remotely is if I the code I am executing happens to be stuck in an infinite loop or deadlock or something. I guess I could just panic the whole program. (I am starting to understand the value of the panic apparatus now).

@semicoleon, could you explain cooperative here, and if/why using pthread is okay... I deploy only to linux systems, so maybe I can get away with this solution. But I'm thinking now, after reading these comments that maybe just panicking the main thread is preferable

pthread_cancel, at least with deferred cancellation, is cooperative in that it requires that the thread being cancelled take action on the cancellation signal itself. It doesn't force the thread to exit; it instead arranges for the thread being cancelled to be told to exit the next time it reaches a cancellation point. The API docs go into some detail on where those points are - which is a long list of functions, since calling those functions is what trips deferred cancellation and cancels the thread.

One of the requirements is that a thread that is cancellable in this manner (which most aren't) must guarantee that it can be safely terminated at any time if it calls a function which acts as a cancellation point. This means that the programmer, writing the threaded routine, must ensure that any locks are released before calling open, and that shared data structures are coherent before calling waitpid, and so on, because a program that is not careful about this will almost certainly deadlock or do the wrong thing when the thread is cancelled out from under these resources. This also means that a thread that never calls a function containing a cancellation point will never actually be cancelled.

This means that you can't correctly tag a thread as cancellable if you don't know for a fact that the code running in that thread is cancel-safe, which is equivalent to the problem you started with, unfortunately.

pthread_cancel also supports an asynchronous cancellation mode; threads which support it must be prepared to be terminated at any time, without any warning. That effectively means they can never hold locks, or handle volatile state with anything less than extreme care.

4 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.