How to design concurrency of massive tasks in Rust

Hi There, I'm new to Rust and I recently wanted to make a small tool, but I have absolutely no idea how it should be implemented

My scene is like this:

I have quite a few tasks, on the order of 2^32, each of which returns a value

I need to send the current optimal value to the server via http request at any time without waiting for all tasks to complete

After all tasks are completed, continue to the next cycle

I checked tokio's concurrency framework and found that it does not meet my requirements

submitting all tasks will make my machine run out of memory

I don't know how to limit the number of tasks in the queue and only remove the tasks from the queue when they are done



What framework should I use

And how should this be done?

Any pointers would be greatly appreciated

I suspect that if you try running 4 billion pieces of work as threads or async tasks is always going to be massively inefficient and poor performing. Unless you happen to have a super computer with millions of cores to hand. Do you? It will certainly consume huge piles of memory. On a machine with only a few cores I don't see this ever working well.

Are you sure you can't arrange for there to be a far smaller number of threads/async tasks and feed them with the 4 billion pieces of work from a queue.

Do these work units share any data? If so I suspect Amdahls law will kill any idea of gaining massive performance by using millions of threads/tasks.

Perhaps we need to know more about what you are actually computing to make an sensible suggestions.

2 Likes

Thanks for your reply

The task details can be revealed:

I work for a cloud computing vendor. We have a large number of intranet IP address segments, which are allocated to virtual machines with wide geographical distribution. I need to find the highest and lowest latency among them by ping, so as to find out the network problem in time.

This task has high real-time requirements, which means that it does not need to wait for all scans to end before displaying the results, but at each moment, the current highest and lowest delays can be displayed.

In other words, instead of requiring 2^32 tasks to run concurrently, I only need to run a part of them each time, and after the run is complete, continue with the next batch

The main problems I'm facing right now are 2:

①In the Rust message queue, how to wait for the message to be processed before removing it from the message queue (in order to block the producer to push data into the message queue)?

②How to effectively control the size of the waiting queue of tokio:spawn to limit the use of hardware resources (mainly memory)?