Using async/await for recursion without stack overflows

I've more than once needed to transform a simple recursive algorithm (such as visiting a DOM tree) into explicit iteration with a manually maintained stack of pending work (e.g. nodes to visit) to avoid stack overflows (e.g. with a deeply nested DOM tree).

I seems like async/await syntax could be used to make this transformation much easier, with a simple executor which pushes futures onto a stack.

I have a crude proof of concept (playground) which lets me write something like this:

async fn sum(node: Rc<Node<usize>>) -> usize
{
    let mut result = node.val;
    for child in &node.children {
        result += TreeExecutor::recurse(sum, Rc::clone(child)).await;
    }
    result
}

fn main() {
    let t = Tree {
        root: Some(Rc::new(Node {
            val: 77usize,
            children: vec![
                Rc::new(Node {
                    val: 123,
                    children: vec![],
                }),
                Rc::new(Node {
                    val: 312,
                    children: vec![],
                }),
            ],
        }))
    };

    let result = TreeExecutor::map(sum, Rc::clone(t.root.as_ref().unwrap()));
    println!("Got result {}", result);
}

That kind of works, but needs the data to all be 'static - hence the Rc<Node> and a lot of clones. I'd really like to be able to have a scoped version which would let you traverse a local data structure without needing anything 'static. I think that means threading a reference to the executor (or something related) through the recursive calls, and it seems like the lifetime handling would be quite similar to how arenas can be implemented but I think with extra lifetimes.

Here's my attempt below (playground link), which doesn't compile:

error[E0597]: `te` does not live long enough
   --> src/main.rs:160:36
    |
160 |     let result = TreeExecutor::map(&te, sum, Rc::clone(t.root.as_ref().unwrap()));
    |                                    ^^^ borrowed value does not live long enough
161 |     println!("Got result {}", result);
162 | }
    | -
    | |
    | `te` dropped here while still borrowed
    | borrow might be used here, when `te` is dropped and runs the destructor for type `TreeExecutor<'_, '_>`

I'm hoping that people here can provide the missing insight and/or point out the silly lifetime errors. :slight_smile:

I have just come across the async-recursion from this thread which I'll also look at for inspiration.

use std::fmt::Debug;
use std::future::Future;
use std::pin::Pin;
use futures::future::FutureExt;
use futures::task::noop_waker_ref;
use std::cell::RefCell;
use std::task::{Poll,Context};
use std::rc::Rc;
use std::marker::PhantomData;

#[derive(Debug)]
struct Node<T: Debug> {
    val: T,
    children: Vec<Rc<Node<T>>>,
}
#[derive(Debug)]
struct Tree<T: Debug> {
    root: Option<Rc<Node<T>>>,
}

// Lifetimes:
// 'data: lifetime of the data we want to traverse.
// 'te: lifetime of the TreeExecutor
type TETask<'t> = Pin<Box<dyn Future<Output=()>+'t>>;
struct TreeExecutor<'te, 'data> {
    tasks: RefCell<Vec<TETask<'te>>>,
    phantom: PhantomData<RefCell<&'data str>>,
}

impl<'te, 'data: 'te> Default for TreeExecutor<'te, 'data> {
    fn default() -> TreeExecutor<'te, 'data> {
        TreeExecutor {
            tasks: RefCell::new(vec![]),
            phantom: PhantomData,
        }
    }
}

struct PendingFuture<R> {
    result: Rc<RefCell<Option<R>>>,
}

impl<R> PendingFuture<R> {
    pub fn new() -> Self {
        PendingFuture {
            result: Rc::new(RefCell::new(None)),
        }
    }

    pub fn get_result_ptr(&self) -> Rc<RefCell<Option<R>>> {
        self.result.clone()
    }
}

impl<R> Future for PendingFuture<R> {
    type Output = R;

    fn poll(self: Pin<&mut Self>,
                _cx: &mut Context<'_>) -> Poll<Self::Output> {
        match self.result.borrow_mut().take() {
            None => Poll::Pending,
            Some(r) => Poll::Ready(r),
        }
    }
}

impl<'te, 'data> TreeExecutor<'te, 'data> {
    pub fn map<T, R, F, Fu>(te: &'te TreeExecutor<'te, 'data>, f: F, arg: T) -> R
    where Fu: Future<Output=R>,
          F: Fn(&'te TreeExecutor<'te, 'data>, T) -> Fu
    {
        let mut fut = f(te, arg).boxed_local();

        loop {
            match fut.as_mut().poll(&mut Context::from_waker(noop_waker_ref())) {
                Poll::Pending => {
                    println!("Future not ready yet");
                }
                Poll::Ready(r) => {
                    return r;
                }
            }
            let (num_tasks, mut task) = {
                let mut tasks = te.tasks.borrow_mut();
                let num_tasks = tasks.len();
                (num_tasks, tasks.pop().unwrap())
            };
            // Try to poll the task
            match task.as_mut().poll(&mut Context::from_waker(noop_waker_ref())) {
                Poll::Pending => {
                    println!("New task not ready yet");
                    // Put it back, before any new ones which may have been added.
                    {
                        te.tasks
                          .borrow_mut()
                          .insert(num_tasks-1, task);
                    };
                }
                Poll::Ready(()) => {
                    println!("New task ready!");
                }
            }
        }
    }

    pub fn recurse<T, R, F, Fu>(te: &'te TreeExecutor<'te, 'data>, f: F, arg: T) -> Pin<Box<dyn Future<Output=R>>>
    where Fu: Future<Output=R>,
          F: Fn(&'te TreeExecutor<'te, 'data>, T) -> Fu,
          F: 'static,
          R: 'static,
          T: 'static
    {
        // Instead of actually recursing, we create a future
        // and push it onto a queue, but return a different
        // future which just stalls.
        let result = PendingFuture::<R>::new();

        let result_ptr = result.get_result_ptr();

        let task = async move {
            let res = f(te, arg).await;
            *result_ptr.borrow_mut() = Some(res);
        }.boxed_local::<'te>();

        {
            te.tasks.borrow_mut().push(task);
        }

        result.boxed_local()
    }
}

async fn sum<'te, 'data>(te: &'te TreeExecutor<'te, 'data>, node: Rc<Node<usize>>) -> usize
{
    let mut result = node.val;
    for child in &node.children {
        result += TreeExecutor::recurse(te, sum, Rc::clone(child)).await;
    }
    result
}

fn main() {
    let t = Tree {
        root: Some(Rc::new(Node {
            val: 77usize,
            children: vec![
                Rc::new(Node {
                    val: 123,
                    children: vec![],
                }),
                Rc::new(Node {
                    val: 312,
                    children: vec![],
                }),
            ],
        }))
    };

    let te: TreeExecutor = Default::default();
    let result = TreeExecutor::map(&te, sum, Rc::clone(t.root.as_ref().unwrap()));
    println!("Got result {}", result);
}

What you are trying to do here is harder than just using a manually maintained stack, to be honest.

While this isn't what you asked for, I have started writing programs where I immediately spawn a new thread with a large stack size 1GB in my case) where all the programs actual work is done. This deals with the fairly small default stack size.