Make concurrent requests to an API until condition is met

I found a general answer how to make concurrent requests using the crate reqwest on StackOverflow here.

I also searched this forum, but did not find the answer I am looking for, so here goes:

To get the information I want, I have to make GET requests to an (in theory) unknown number of pages, which means in each iteration of the loop something like ?page=1 or ?page=2 gets appended to the URL.

When the API has no more information for me, it returns a String with [] (empty brackets) as content, which currently happens for me if the page_num is 11, but of course for other users this will be different.

Now since each "page request" takes 1-2 seconds, I want query the individual "page urls" concurrently, and when one of the "page requests" returns [], break out of the loop.

for page_num in 1..=20 {
        // request_a_page builds the page specific URL and a Reqwest client, 
        // then makes a GET request and returns Result<String, Box<dyn Error>>
        let body = request_a_page(page_num, token_info)
            .await
            .unwrap_or_else(|_| panic!("failed to get page {page_num}"))

        // currently true for if page_num == 11
        if body == "[]" {
            log::info!("empty response --> break");
            break;
        }

      // ... process body ...
}

I have previously written a Go version of this, where I simply started multiple goroutines with a sync.Waitgroup. At one point one of the goroutines would set a flag (struct with sync.Mutex) value, so no more new goroutines would be started (as a workaround/temporary fix I added a time.Sleep(250 ms)
after page_num is higher than the number of pages that I know I currently need to query, so that Go does not start hundreds of goroutines in the meantime).

Since I am new to Rust and mostly code as a hobby, I am currently not able to translate this (even with help from the Stackoverflow example above) into Rust code, so any help on this would be appreciated.

You can use a tokio::sync::Semaphore for this. You wrap it in an Arc and set the number of permits to the number of concurrent requests you want to allow. Your request_a_page method should do this:

  1. Take as argument an OwnedSemaphorePermit.
  2. Request the page.
  3. Check whether the page contains what you're looking for, and close the semaphore if so.
  4. Drop the OwnedSemaphorePermit.

Then, to use this, you write a loop that does this:

  1. Acquire an owned permit from the semaphore.
  2. Use tokio::spawn to start a background task that runs request_a_page with the owned permit.

If the acquire operation on the permit fails, that means that the semaphore has been closed. In this case, you just break from the loop because you already have the answer in that case.

3 Likes

This seems to be exactly what I was looking for, thanks!
Will try to implement it in the next days.

Edit: Made it work by following the steps you provided. Thanks again!

Another point: It can be a good idea to store the spawned tasks in a JoinSet. Then, once you break out from the loop, you can call join_next in a loop until you get the answer (assuming the spawned task returns an Option<TheAnswer>), and then you can abort any remaining running tasks by calling shutdown on the JoinSet.

I'll check it out once I have a better understanding of how tokio works. Planning on working through the tutorial first.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.