Job scheduling with dependencies

Given Rust's very good concurrency support, I am hoping for an off the shelf library would help me with this problem without having to reimplement everything from scratch. Here it goes:

I have an iterator that yields a collection of named objects that need to be processed. Order is not important, however there are dependencies: Given object x, I can look at it and see that it depends on some other jobs y, z (identified by name). Furthermore some objects are "serial", i.e. they depend on all objects previously yielded by the iterator. The individual jobs are small and many (~ 1 ms, although some are much larger).

My current implementation is as follows: On the main thread, loop over the objects, inside a rayon::scope, and spawn each job. If you hit a serial element, end the scope, process the element, and restart the scope. In the job processor, it checks if there are dependencies, and if any dependencies have not been finished yet, spawn another job on the same element and return failure; otherwise it does the actual work.

This is taking advantage of rayon::scope's documented claim of LIFO processing order so that enqueuing a job puts it at the end of the line, which maximizes the chances that next time around we will have gotten to all the dependencies. In reality, this seems to have a lot of issues with fair scheduling, where there are some elements in the queue that are not being serviced, and the same few elements get hit in a loop, which are depending on the unscheduled element and so just keep getting re-enqueued.

It would be much preferable to just register the dependencies and go through them in breadth first order or something, but rayon seems tailored to the no-dependency case with its parallel iterators. What are my options for this kind of task?

It seems the async executors handle such complexity (namely execution dependencies) better than the rayon, though it's not their intended main use case. Here's some example code using tokio for its core logic.

for job in jobs {
    tokio::spawn(async move {
        for dep in &job.deps {
            env.wait(dep).await;
        }
        job.run(); // or .await it
        env.done(&job.name);
    });
}

You still need to write supplement code around the job and the env, but you can make it with existing sync primitive provided by the stdlib and the runtime like tokio. If those jobs performs some IO, you may consider to make it async as blocking IO still occupies one of the fixed amount of runtime threads.

1 Like

What is the wait function doing in that example? Is there some tokio future that should be called there, or a condvar-based implementation?

The easiest way to implement that wait is probably Tokio's watch channel with a boolean item type.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.