Rayon but for multiple computers!


Say that I have 1,000 tasks of work that I would like computed in parallel, but I would like to distribute them across a network, because the network has more cores than my individual computer. This is easy enough (famous last words) to write.... However, the tasks I send may spawn child tasks that need to be distributed and now I have the problem of distributed deadlocks :cry:

I'm sure that I am reinventing the wheel here, but at the same time I'm a little bewildered by the options and jargon, and general unfamiliar territory when I search for distributed computing, etc.

Has anyone had a similar need and can write me a little about how they solved it?

Can you tell us a bit more about the nature of your program? What is a task doing? What does the data you are dealing with look like? Are tasks all the same or heterogenous (i.e. is Task A doing the same thing as Task B but with different data)? Do you already have a cluster manager installed or how do you spawn processes on each computer in your network?

Umm, okay, well in my first iteration of the problem a dispatcher is given one of these:

pub trait DispatchRequest: Serialize + Send + Sync + Clone {
    const ROUTE: DispatchRoute;
    type Response: DeserializeOwned + 'static + Clone + Send + Sync + Debug;
    fn cache(&self, v: Self::Response);
    fn cached(&self) -> Option<Self::Response>;

and in its config.yaml is listed the addresses of the various workers.

and the worker is just a HTTP server living at that address that knows what to do when a connection is made at a certain route, and responds to the HTTP connection with the response.

It also has a /status route where it it returns its CPU load, and the dispatcher picks and chooses the server with the lowest CPU load and/or highest number of cores.

But say that I designed a task, that itself wants to call into the/a Dispatcher, now I'm screwed. because if all servers are busy, I will deadlock myself.

Would switching to async be an option for you? Seems to me you deadlock, because you call the dispatcher from a worker and wait for the dispatcher to synchronously respond to you. If you don't wait for the response, your worker is free to handle other tasks.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.