Waker::wake_by_ref can be lost if called before returning Poll::Pending?

Hi,

For code below, there 2 questions:

struct Futwn(Box<WakeNode>);
impl Future for Futwn {
    type Output = ();
    fn poll(self: Pin<&mut Self>, _: &mut Context) -> Poll<Self::Output> {
        let waked = self.0.waked.load(Ordering::Relaxed);
        if waked == 0 {
            return Poll::Pending;
        }
        Poll::Ready(())
    }
}

impl WakeNode {
    fn new(waker: Waker) -> Self {
        WakeNode {
            waked: AtomicI32::new(0),
            entry: ListHead::new(),
            waker,
        }
    }

    fn wake(&self) {
        self.waked.store(1, Ordering::Relaxed);
        self.waker.wake_by_ref();
    }
}
  • There executing time line are possible to be:

    1. task2: self.0.waked.load(Ordering::Relaxed)
    2. task1: WakeNode::wake()
    3. task2: return Poll::Pending
      The question is can wake_by_ref lose in this case?
  • For any future.await, for the first Future::poll, is the code like this safe?

    struct context;
    impl Future for context {
        type Output = Waker;
        fn poll(self: Pin<&mut Self>, ctx: &mut Context<'_>) -> Poll<Waker> {
            Poll::Ready(ctx.waker().clone())
        }
    }
    
    async fn test() {
        let wn = Box::new(WakeNode(context.await));
        Futwn(wn).await;
    }
    

    It is a question because waker changes when poll is called and should be cloned from context.waker(). But for the first time there should be no context switching, thus makes possible and safe to get the waker from context.await. I am not sure is this true.

This execution is not possible because task2 owns a Box pointer to the WakeNode, which means that it must have exclusive access to the WakeNode. Any parallel access is undefined behavior.

Even if you replaces the Box with an Arc or raw pointer to get around this issue, then yes, you can lose the wakeup. The specific execution you posted here would not be a problem, because wake is called after the call to poll started, which guarantees that you will be polled again. However, you're using a relaxed atomic ordering, so your thread might not see the store(1) operation, even if it saw the wake_by_ref operation. Hence, from the perspective of task2, the operations might happen in this order: (even if step 1 and 4 happen in the other order from the perspective of task1)

  1. task1: WakeNode::wake()
  2. task2: self.0.waked.load(Ordering::Relaxed)
  3. task2: return Poll::Pending
  4. task1: self.0.waked.store(1, Relaxed)

You should use the acquire/release atomic orderings instead.

Although it does seem like you end up with the right waker for the first call to FutWn::poll, it might not be the right waker if FutWn::poll is called a second time. You must be ready to be polled again later with a different waker, and you must replace the waker in that case.

To be clear, you must be able to replace the waker with a new one. You can't avoid this.

2 Likes

Box does not have exclusive access, Box can be seen as an owned object, but the parallel access is via shared reference and no writing access to the owned object, so technically it likes this:

fn main() {
    let s = Box::new(String::from("123"));
    let sp = &s;
     println!("{}", s);
     println!("{}", sp);
     println!("{}", s);
}

I don't know why it is not a problem if the wakeup is lost.
It will not pull again: it's pulled because it is waken up, but the signal lost. which guarantees that you will be polled again means the signal never gets lost: what I am not sure it's true or not.

Yes, it does possibly happen. It is not a problem if the signal never loses I realized just now it is a problem even the signal never gets lost, for a poll quick enough to be called twice before store(1)

I don't have to do that, in this particular code, the semantic is when it's waken up, await finishes unconditionally, and it will not poll twice without a wake, so I don't need the second waker.

I prefer the wakeup never gets lost.
Because otherwise it's a hard fixed problem: wakeup as a function is called always by another thread, well, the thread calling future.await contains internal state to determine whether it should return pending or ready. Between the checking state and returning, there are always possibilities to change the internal state to ready, and wake up get called, but the previous checking decides to return pending, if the signal gets lost, serious problem happens --- the future may never been polled again.

The solution is providing a mutex around the checking and returning which block the wakeup from getting called, and releases the mutex at a point where the runtime will not miss a incoming wakeup.

That is too complicated and even impossible, because the mutex will have to passed to runtime for releasing, there is no such a interface to do that.

This is not categorically true - some executors will send you spurious wakeups that were not requested

eh, it is a sad thing....

It's all executors, even.

More advanced join! implementations will sometimes wrap the waker to avoid this, but the simplest way to implement join is

async fn join<A, B>(a: A, b: B) -> (A::Output, B::Output)
where A: Future, B: Future,
{
    let mut a = pin!(a);
    let mut b = pin!(b);
    let mut a_poll = Pending;
    let mut b_poll = Pending;
    futures::poll_fn(|cx| {
        if a_poll.is_pending() { a_poll = a.poll(cx); }
        if b_poll.is_pending() { b_poll = b.poll(cx); }
        if a_poll.is_ready() && b_poll.is_ready() {
            Ready(a.unwrap(), b.unwrap())
        } else {
            Pending
        } 
    }).await;
}

and this will result in the sibling futures getting spuriously polled when the other wakes, because we don't differentiate between the wakes. Just like condvars and thread parking, you must tolerate spurious wakeups. (But unlike those, if you want to ensure you get the correct wakeup, you need to change how you send a wakeup request on each spurious wakeup. AtomicWaker exists to assist with this.)

3 Likes

Then what do you think of that, it is true or still there is something I lost in mind?

I'm not sure what you mean with it being too complicated or impossible. There are lots of synchronization primitives out there that protect the waker with a mutex, and they work fine and were not impossible to implement.

1 Like

But this is not a problem. It is the executor responsibility to properly handle any wakeup that happens right before the poll returns, since it would still be after the poll started, which is what the contract is about.

For instance, a yield_once() future can be implemented as:

let mut should_yield = true;
poll_fn(move |cx| if mem::take(&mut should_yield) {
    cx.waker().wake_by_ref();
    Poll::Pending
} else {
    Poll::Ready(())
})

Notice how here the wakeup unconditionally happens before returning Pending, but after poll started. This is a legitimate Future implementation, which means for an executor to be correct it needs to be able to handle this / it is not allowed to "lose" the signal just because it happened before poll returned. More generally, my cx.waker().wake_by_ref() could be happening "then", but from another thread, which then matches your atomic TOCTOU question.

1 Like

Yes, that's what I mean: the signal never gets lost or else a mutex should be involved.

The mutex or any synchronization primitive can't do that without being passed to executor, It need be released by executor at some point outside of poll if we want the future being polled again and the wakeup itself is lost-able.

I think I understand what you mean. It's true that with a mutex, the waker can be woken before you return from poll (but after releasing the mutex). However, that's ok. A wakeup can only be lost if the waker is woken before the start of the call to poll. If a wakeup happens during a call to poll, then that's the same as it happening after the call to poll.

What you said is somewhat imprecise, happening after the call to poll is just before the start of the call to another poll, but it can not be both lost-able and not.

I guess you are saying at least one wakeup will be ensured to be not lost-able if there are more than one?

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.