I have some futures of IO work that I can perform in parallel but I'm wondering if concurrency is enough, but I don't know enough about how IO operations work in Rust to be able to reason about this.
I am using the AWS SDK, which I assume uses a standard mechanism to issue all HTTP network calls. But let's be specific and say I've got a Vec of futures that are using put_item.send().
If I want to be absolutely sure of parallelism (or, at least as much as tokio can give me on my runtime machine) I could use tokio::spawn or a JoinSet to wait for them all to finish, which feels reasonably heavyweight. But what if I just want to do them concurrently with the join! macro or futures::join_all... is the network IO done in a blocking way (one at a time) or will this actually end up being parallel network calls because it's all being handled in a non-blocking way at the lower levels?
To make the bit that I'm confused about clearer; I'm unsure if the futures::join_all would be yielding the CPU after each network request, being picked up again when the response is received, then submitting the next network request (serial network calls) or is it submitting all the work to a shared network layer and THEN waiting for the responses to come in (resulting in parallel network calls).
The only thing that would benefit from running in parallel is for the server responses in each case, the benefits to doing the serialisation and deserialisation in parallel are extremely marginal and perhaps even counterproductive.
And, as a follow up question, is there a cleaner way to create a JoinSet and throw some futures onto it? It feels like this is crying out for a helper function...
let work: Vec< ... Future... > = ... ;
let mut join_set = tokio::task::JoinSet::new();
for job in work {
join_set.spawn(async move { job.await });
}
for result in join_set.join_all().await {
...
}
All of these solutions will yield properly to the runtime, and the IO will be non-blocking and executed concurrently. However, using JoinSet or tokio::spawn will be marginally cheaper than futures::join_all. It is very rare for join_all to be preferable over spawning.
That's interesting that join_set would be more efficient, I would have expected the futures::join_all to be cheaper so that's useful to know.
My concern would be that the underlying Vec of Futures would be unwrapped into
let response0 = work[0].send_request_and_wait_for_response().await;
let response1 = work[1].send_request_and_wait_for_response().await;
...
let responseN = work[N].send_request_and_wait_for_response().await;
vs, what I really hope would happen, is more like (with the yieldable blocks of code of each future being reordered)
let pending_work0 = work[0].submit_request_to_pool().await;
let pending_work1 = work[1].submit_request_to_pool().await;
...
let pending_workN = work[N].submit_request_to_pool().await;
let result0 = pending_work0.wait_for_response().await;
let result1 = pending_work1.wait_for_response().await;
...
let resultN = pending_workN.wait_for_response().await;
Which is all still just "concurrent" rather than "parallel" since the runtime could still oblige this with a single thread and is weaving the results together, yielding the CPU when progress cannot be made, but the "parallel" work would actually be happening in the network abstraction.
I would consider both to be sensible interpretations of "yielding properly", but they have very different performance characteristics. It's unclear from the tokio docs what happens to ordering but at least the futures::join_all does explain (in FuturesOrdered, which I have only noticed after asking the question)
futures in the set will race to completion in parallel, results will only be returned in the order their originating futures were added to the queue
which does seem to be what I'd be looking for. But you've already noted that JoinSet is probably more efficient, so I'd be looking to use that now anyway... although the aesthetics of using it leave a lot to be desired.
ok, thanks. It sounds like the docs for futures::join_all are a bit muddled in that case as they claim that the futures run "in parallel"... I was wondering how they could have achieved that because all they can do (without access to a runtime) is reorder things, they can't actually spawn anything.
When join_all has a small number of futures, it does not fall back to FuturesUnordered. In this case, whenever one future is polled, they are all polled, which is inefficient.
When join_all has a large number of futures, it falls back to FuturesUnordered. The internals of FuturesUnordered use a data-structure that is very similar to how Tokio tracks its tasks internally, so the cost is going to be similar except for the concurrent/parallel distinction.
small correction: the docs say it falls back to the FuturesOrdered, but I understand your point that for small numbers of futures it's doing a lot of needless work. Maybe they feel that the cost of the data structure is not worth it for small numbers of tasks.