Asynchronous queue for use with tokio (or: how to scrape tile servers with tokio?)

Hello rust community!

I am currently just playing a bit with Rust and try to do little helper with it. Today I needed a slippy map tile downloader. Basically this is just a HTTP downloader for a lot (think 100k) of generated URLs. Most tile servers have several backends so tiles can be fetched from a list of different hosts. What I did in async Python (no offense, Python is just still my natural language for small helper scripts) was to create on async task per backend host and make all those tasks watch a shared queue. My main task would just put tile paths on those queue and whichever task is free would pick it up and download from its assigned tile server host.

Unfortunately, Python showed some "interesting" behaviour and this made me try to rewrite the small program in Rust for fun. Getting along with futures is pretty easy, they seem quite similar to Java,JS, Python, but with added ownership safety. The point where I had to stop to my surprise is a the queue. After an hour of searching I did not find something equivalent to Python's async.Queue or Go's channels that work with tokio. What I have come along and looked most supported was basically

  1. tokio::sync::mpsc, but that supports only one consumer, but multiple senders. The exact opposite than what I need for my approach
  2. crossbeam_channel. Functionality looks great, but it seems to do thread primitives only and would probably not play nice with tokio (like blocking a thread instead of giving control back to the worker/event loop)

I found more crates, but those two sounded the most promising.

There is an open crossbeam issue with planned Future support. I guess this would be exactly what I want, but that seems not to be available yet.

Am I missing some queue implementation that fits my needs?
What other approaches to people use to distribute work to multiple tasks?

PS: I think I really want some work distribution mechanism for my code since I would not want to start a new task for every tile that I want to download.

1 Like

If you're just wanting to start multiple asynchronous tasks, one naive solution would be to spawn each task downloading the tile on the current executor (e.g. tokio::executor::spawn() as the tiles come in. That way the executor can schedule things so the tasks are automatically distributed across multiple threads and polled to completion. This might pose a problem if tasks are spawned a lot faster than they are completed, though...

EDIT: Sorry, I just re-read your PS and noticed what I propose above is exactly not what you want to do.

A completely different approach would be to have a look at actix. It's an actor framework (see actor model) that is built on top of tokio and works equally well with both synchronous and asynchronous tasks. I'd definitely recommend checking it out because the idea of sending messages between actors (e.g. the Scraper might be sending "please download http://example.com/" messages to a TileDownloader) to do work sounds like a great architecture for your use case.

2 Likes

I think you already highlighted in your original answer why this might not be a good idea, I see two problems

  1. Worst case it would create all (>100k) tasks before the first task is actually scheduled potentially wasting a lot of resources for each of the already allocated tasks
  2. It might generate a lot of in-flight HTTP requests to the same HTTP server. This again might consume a lot resources, get my little program blocked by the tile server or worse, even impact the tile server operation

Thank you for bringin up Actrix, I did not know it existed before. I looked a bit into Actrix. I think I got the general concept, but without details. I think the 1:1 mapping (as in: 1 Scraper: 1 TileDownloader) you are describing would definitely work with Actrix, but it would already work with a mpsc channel as tokio provides. The tricky part is having multiple receivers (i.e. 1 Scraper: n TileDownloaders). I think in Actrix terminology this would mean multiple actors listing to the same address and having an idle one picked when a message is sent there.

SyncArbiter seems to be close to what I want, but the use case is slightly different, it is meant for CPU bound operations and as such schedules each Actor on a separate OS thread. My scenario is IO bound, so I don't need this overhead. Also using SyncArbiter seems to be a lot of boilerplate for what I want TBH.

The task description is not clear, but you can for each server from which you going to download create N future aware mutexes futures_locks - Rust ,
and then when write future that find unlocked mutex or go to NotReady state if no unlocked mutex ready.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.