Efficiently executing large graphs of Futures?


Hi all,

I’m very commonly computing algorithms over large DAGs. Each node in the graph requires some IO to fetch, and the computation applied to the node is also async, and generally requires data from its immediate neighbours.

The result is an async computation where the graph of futures is more or less the same shape as the data structure itself, executed in a topological ordering of the graph. This works well for small examples, but falls down when the examples get larger.

There are two bad cases:

  1. if the graph is very deep, then there’s very deep recursion on Future::poll calls, and ultimately it will run out of stack
  2. if the structure is very wide with sharing, then makes many redundant calls to the shared nodes (the number of distinct paths through the graph to the shared node).

For example, a single call to Root::poll() will result in 6(?) calls to Shared::poll() (assuming they’re something like Future::join() which polls both sides):

       / \
      x   x
     / \ / \
    x   x   x
     \ / \ /
      x   x
       \ /

The first case is easiest to reason about - rather than using depth-first recursion into the structure, it should maintain a queue to keep track of what Futures still needs to be called. Or you could structure things with a loop or a fold to effectively do tail-recursion.

And the second can be handled by memoizing which Futures have already been called within one call to the Task’s top-level Future.

But as far as I know, there’s no general implementation of this. One could implement both in an ad-hoc way, but its hard to manage this over a large Future structure formed by the composition of a lot of combinators.

So how do people deal with this? I’m assuming this isn’t a problem unique to us.

I’m toying with the idea of implementing a container of Futures which explicitly knows the topology of the graph, which can manage the computation efficiently - a little like FuturesUnordered manages the execution of a number of Futures. Does something like this exist already?

Or can this be solved by spawning off a lot more Tasks?



You could split the graph up by creating a task per node, and wiring them together via oneshots. That allows the executor to only poll the tasks that are actually active.


Yeah, I guess that’s one way to handle it. What are the tradeoffs for doing that? Presumably putting every distinct Future into its own Task would be too costly, but are there recommendations about reasonable subgraph sizes to chunk up into Tasks? I’m concerned about cases where a large computation is built up with lots of composition, so no one single piece of code necessarily knows how large the graphs its dealing with are, so nothing is in a good position to apply a Task-chunking policy.

Does this apply as easily to Streams, or do channels make it more awkward?