I'm using a library that is blocking (potentially infinitely) in a single-threaded environment but I need to perform other tasks in concurrently. While this is generally impossible to achieve (to my knowledge), the library has the ability to register call-back functions.
Is is possible to use these call-back functions to kind of interrupt the execution with an await, even though we're within a non-async-function? IIUC there needs to be a surrounding executor, for sure.
Some pseudo-code
fn blocking_library_code(callback: Fn) {
loop {
callback();
do_more_blocking_library_stuff()
}
}
// I need to find out how to make async calls work in WASM
// not sure how to gain access to the executor, if that is needed
async fn my_entry_point() {
blocking_library_code(my_callback);
}
fn my_callback() {
// is there a way to re-enter `async`-land as one of our callers
// on the stack is already under the reign of an executor?
do_concurrent_stuff().await;
}
async fn do_concurrent_stuff() {
…
}
So the call chain is: async entry point → blocking library code → blocking custom callback → async custom function
(As my use case is very specific, I'd be ok with any other solution that makes it work. e.g. maybe it's possible call into the surrounding JavaScript and make it yield somehow?)
This looks ill-formed, to me. You are not supposed to call any blocking functions within async fn. Any executor that polls the Future will be blocked by blocking_library_code().
Your async fn is running within a web browser's Web Worker. So, while something like tokio::task::spawn_blocking() is not available, it probably is not critical that the worker thread will be blocked by RustPython. However, you won't be able to do anything else in that worker at all, which is probably what lead you down this path in the first place. (Recall that blocking in an async fn is "ill-formed".)
I'm going to tentatively say probably not. The best you can expect a callback to do is poll some shared state. For instance, the callback can check if any other task has sent it a message with Receiver::try_recv().
I'll spare the details, but Future and Promise are designed the way they are (as cooperative multitasking primitives) because tasks built from them are interruptible in the way you describe. Blocking functions are not. That's why there is a distinction.
I don't mind if the executor doesn't get a chance to schedule other work for most of the time, as long as there's a way to use the callbacks to give back control to the surrounding executor (which is my actual question here)
I'm going to tentatively say probably not. The best you can expect a callback to do is poll some shared state. For instance, the callback can check if any other task has sent it a message with Receiver::try_recv().
This essentially is what I'm trying to make work. But without JavaScript yielding to the runtime, it will cut off from the outside (the channel that I need to poll will never be filled as the required callbacks cannot be executed).
One workaround would be to use shared memory and atomics to inject data into a Worker without having to yield to the JavaScript Runtime. But that would be extremely unergonomic and I'd have to make sure to not block the sending main-thread (which is forbidden in Browsers)
What you are trying to do is completely impossible if the wasm is running directly in the browser. As for web workers where it's okay to block, there are some solutions, though even there they aren't amazing.
If you are using an executor that provides some kind of reentrant do_some_arbitrary_work_now() method, you might be able to come up with something, but I don't believe that any of the common ones provide such a method.
In fact, I suspect that the machinery necessary to provide such a method robustly would require adding a lot of inefficiency to the more common case of never blocking within an async block— You'll almost certainly have to write your own custom executor that's designed from the ground up to support this pattern.
At a minimum, it will need to keep track of which top-level tasks are in the middle of being polled higher up the stack and block them from being polled concurrently with themselves (which is necessarily UB due to poll taking an exclusive Pin<&mut ...> reference).
There is not. And this is what I tried to explain by highlighting the sync/async dichotomy. If you could just interrupt synchronous functions like this, we would not need async fn!
How so? futures_channel::mpsc::Receiver::try_next() is not async. There's nothing to await. You don't need anything to yield just for polling shared state.
This answers the question. To add a little context, another impact of this limitation is that there is no way to stop a regular OS thread, in general, from outside the thread's code. That's why thread code often periodically checks a shared atomic "stop" flag.
Well - this was actually my question: can I insert something into my callback to interrupt the surrounding executor. I believe the answer is no, but I don't understand why exactly this is impossible. What makes it impossible to write an executor that permits manual interruptions.
It's slightly different for preemptive threads where you can trivially yield to the OS scheduler in your callback with sleep.
The closest analog available in WASM is memory_atomic_wait32 in core::arch::wasm32 - Rust -- this is not valid to call from the browser's main thread (it will raise a JS exception), but should be ok in a Web Worker. There are also some hacks you can do by calling out to JavaScript.
I thought you were running the sender in a separate worker (or even the main thread) using postMessage. Then the problem wasn't sending messages, but receiving them in the worker with its message event handler. Did I get this wrong?
The problem is in between : the channel happily accepts (and buffers) messages from the sender. But afaik (please prove me wrong!) there's no way to poll from the channel. Instead the JavaScript-runtime calls into an event handler (onmessage) where I could forward the incoming messages to WASM or store them into a list for draining later. Due to the WebWorker's thread being blocked for eternity, the event handler will never execute so there's no point in time where I could retrieve the messages buffered within the channel.
My only workaround so far is writing my own implementation of such a channel, based on shared memory with a proper try_receive method.
If there was a way to yield to the JavaScript runtime, that would help as well. But I think that's impossible for the same reason I cannot interrupt a non-async method in rust, even though an executor is running on top of the stack.
Edit: maybe we talked past each other: while mpsc::channel has the right interface, I was talking about the underlying implementation which would not work in this case.
Yes, I see the problem. Using something like futures-channel would be ideal, but you need a way to send the Sender half to the other thread. That doesn't seem possible because Sender is not serializable across the WASM boundary. You'll need some help from JavaScript for that. There might be a channel backed by SharedArrayBuffer already on crate.io...
Assuming for a moment you have a suitable channel that can be sent across the WASM boundary, replace PostMessage with this channel. As you suggested, postMessage will not work for this [1].
edit: The first search result for SharedArrayBuffer turned up wasm-rs-shared-channel. There might be others, but this is a start.
I don't think the wait32 intrinsic will help, either. It just suspends the thread without resuming any other tasks on that thread. It's an analog for sleep, not yield. Subtle difference, but important. ↩︎
It might even not be required to Send any end of the channel to another thread (even when it might make things more ergonomic). It should be enough to have a shared memory and wrap that with a Sender on one end and a Receiver on the other. I think it's a matter of how much JavaScript I want to write
Thanks for the suggesting to look for a ready-made crate with a shared buffer - somehow that didn't cross my mind (I'll have a look at the crate for sure!).
The perfect solution would be to make RustPython async, but that's above my head for now.
Just to be clear an async executor is nothing special, and cannot force anything to yield/interrupt. Instead the Futures that the executor runs that are supposed to yield to it, by returning from their poll method. If that ends up running blocking code then this of course cannot happen.
Instead the Futures that the executor runs that are supposed to yield to it
Sorry if I'm repeating myself - it just didn't click, yet and I still somehow have the feeling of being misunderstood (more likely I just don't understand those responses).
I'm trying to yield to the executor from within a Future (IIUC). Why is that impossible from a function that has not been tagged as async even though this code runs in an executor? I'm still reading into what implication the async keyword has, other than changing the return type.
My naive hope was to call: surrounding_executor.lock().yield();. I'm convinced that this is not possible, but I'd like to understand the background of this.
At their core executors run the Futures you give them by repeatedly calling their poll methods. Note that these are normal synchronous code. When someone talk about "yielding to the executor" they mean simply returning from that poll method.
The reason you can't yield from the middle of a synchronous function should be clear then, since in order to yield you have to return, but then you're not in the middle of the function anymore!
What do async functions do that's special then? They get compiled down to a struct that "remembers" the state and the code it was executing the last time it had to return to yield to the executor, so that the next time poll is called it can resume from that point.
Yeah this is not possible because a method can never return from their caller, only to their caller (not to mention it would have no way to preserve the caller's state).
save any state that's on the stack into the future’s fields,
optionally, communicate with the executor that it’s intending to yield, as opposed to waiting for some other event,[1] and
return Poll::Pending from the Future::poll() method.
Therefore, the only thing that can make a future yield is the implementation of that Future. If you're not writing async code or an explicit poll() implementation, you don't get to control the return value of poll() and you don't get to command the Future to save its state.
Tokio’s yield_now() does this in order to improve scheduling behavior. It’s not at all fundamental. ↩︎