Given Rust's very good concurrency support, I am hoping for an off the shelf library would help me with this problem without having to reimplement everything from scratch. Here it goes:
I have an iterator that yields a collection of named objects that need to be processed. Order is not important, however there are dependencies: Given object x
, I can look at it and see that it depends on some other jobs y
, z
(identified by name). Furthermore some objects are "serial", i.e. they depend on all objects previously yielded by the iterator. The individual jobs are small and many (~ 1 ms, although some are much larger).
My current implementation is as follows: On the main thread, loop over the objects, inside a rayon::scope
, and spawn each job. If you hit a serial element, end the scope, process the element, and restart the scope. In the job processor, it checks if there are dependencies, and if any dependencies have not been finished yet, spawn another job on the same element and return failure; otherwise it does the actual work.
This is taking advantage of rayon::scope
's documented claim of LIFO processing order so that enqueuing a job puts it at the end of the line, which maximizes the chances that next time around we will have gotten to all the dependencies. In reality, this seems to have a lot of issues with fair scheduling, where there are some elements in the queue that are not being serviced, and the same few elements get hit in a loop, which are depending on the unscheduled element and so just keep getting re-enqueued.
It would be much preferable to just register the dependencies and go through them in breadth first order or something, but rayon
seems tailored to the no-dependency case with its parallel iterators. What are my options for this kind of task?