`LocalWaker` optimization

Hi, there.

I am trying to implement join_all funtion, but got stucked with current LocalWaker implementation, here what I have right now:

use std::future::Future;
use std::task::Poll;
use std::pin::Pin;
use std::mem;
use ::futures::future::poll_fn;

pub fn join_all<T>(mut futures: Vec<T>) -> impl Future<Output = Vec<T::Output>> 
    where T: Future + 'static
{
    let mut results = Vec::new();
    let mut waiting_queue: Vec<usize> = (0..futures.len()).collect();

    poll_fn(move |lw| {
        loop {
            let mut queue_after = Vec::with_capacity(waiting_queue.len());

            for idx in waiting_queue.iter() {
                let fut = unsafe {Pin::new_unchecked(futures.get_unchecked_mut(*idx)) };
             
                match fut.poll(lw) {  // this place I would like to optimize
                    Poll::Ready(val) => results.push(val),
                    _ => queue_after.push(*idx)
                }
            }

            std::mem::swap(&mut waiting_queue, &mut queue_after);

            if waiting_queue.len() == 0 {
                break Poll::Ready(mem::replace(&mut results, Vec::new()))
            }

            if queue_after.len() == waiting_queue.len() {
                break Poll::Pending;
            }
        }
    })
}

As next step of optimization I would like to create a Vec of LocalWakers and poll each future with corresponding LocalWaker, but I have to have a UnsafeWake for each, but it is not necessery - because I can use same one from LocalWaker came to poll_fn. But how to know that specific LocalWaker has woken?

Solution 1

Add is_woken and reset to LocalWaker implementation, but It still has to run through list of LocalWakers and check if it is woken - if yes then it calls poll. But it shoud be faster couse it needs to check the booleans and poll only right futures.

Solution 2

Create impls of UnsafeWaker for each LocalWaker and keep them all. This custom implementation will be puting each woken future in a waiting queue and, on next poll, they only will be polled. But it requires to creating this implementations besides the LocalWaker.

Any ideas?