Relationship between std::futures, futures, and tokio

Can someone please break down the relationship between these 3 crates?

Additionally, I have a specific use case... I want to create a Stream from an Iterator, create a future from their inner items and then map over those Futures, while ultimately limiting that stream of futures to N concurrent.

So it seems the crates/traits I need are in:

And then the basic Future/async/await stuff is all via std ?

std::futures is the core futures functionality needed to implement async/await syntax.

futures-rs is a crate which adds utility and abstraction over futures: FutureExt/TryFutureExt/Stream/StreamExt/TryStreamExt/Sink/SinkExt. You don't need them for async programming, but they would be usefull (stream and guys probably the most), also some synchronization primitives.

Tokio brings an async runtime (some runtime is needed to execute futures), and some additional utility to handle with environment in async way: IO, time, unix signals, also synchronization primitives (some of them provided by futures-rs, some not). Tokio is build on futures-rs, but it exports things it uses, so there is no need to depend on futures-rs if you are not using additional things which tokio doesn't reexport.

There is also async-std which want to bring API very similar to standard lib, but for async programming - somehow like futures-rs and tokio merged together. It uses different executor than tokio. I don't know which one is better, but for sure I know that there are things which there are not in async-std, but they are in futures-rs.

About your usecase - I don't see a question. You found correct tools to do this, you will just need a runtime for your async code (probably tokio or async-std). The question is if this is a good choise to do this as async code - if all your futures are just cpu-bound calculations, and you want split them on thread this way, it is probably not - rayon and traditional concurrency is a way to go. Keep in mind, that "concurrent" is not the same as "parallel". Async programming is a tool for avoiding busy-waiting for things which are not there (so for example for handling network connections, maybe some kind of input, or reading huge files). It doesn't do well with typical parallel execution, cause executor and futures are just an extra overhead.

2 Likes

Thanks, such a great breakdown! Really appreciate it :slight_smile:

Sorry - I was in a bit of a rush and wasn't clear with my use-case... it is to have a long list of urls and then download them to the local filesystem. So something like:

  1. Turn Iterator<Item = Url> into Stream<Item = Url>
  2. Map over that and use reqwest to get Stream<Item = Response>
  3. Map over that and asyncronously write those responses to disk
  4. Limit all the stream processing so it doesn't go nuts - e.g. 10 requests/writes at a time (can be same value I guess - e.g. 10 "items in the stream")
  5. Finally, await on that stream until its done

I've commented on this specific problem here: https://github.com/seanmonstar/reqwest/issues/482#issuecomment-584245674

I would say your usecase would look like:

stream::iter(urls)
  .for_each_concurrent(Some(10), |url| async move {
    let body = reqwest::get(&url).await.unwrap().text().await.unwrap();
    let path = path_from_url(url);
    tokio::fs::File::create(path).await.unwrap().write_all(&body).await.unwrap()
  });

One think, I didn't cover here at all is error handling - possibly you want firstly map your urls to result with Result::Ok, and then iterate over them with try_for_each_concurrent, so in case of error you retrieve it, or maybe you want to actually map to Result<(), Err> instead of foreach, so you would collect errors.

I'm okay with panicing in this particular situation (internal tool)

what is the difference between .for_each_concurrent() and what I have there? (.buffer_unordered() + while await) ?

I don't think there is any important difference. TBH I didn't analise your github issue - just suggested basing on your points. I also have no idea if our solutions has any performance difference and I will not even try to guess. I am not even 100% sure if my solution is correct (in particular if File::write_all flushes). I just gave it as quick recap which way I would go - I didn't find any particular question so I didn't spend too much time on it. Only question in your gh I see is "Is this helpful" - the answer is: probably yes. It would allow download multiple bodys at once.

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.