The “pool” (maybe wrong term) would have:
- A dynamic list of work items.
- A dynamic list of workers to processed those items.
- Work items are not consumed by the workers.
- The condition is that the pool must schedule each worker to process each item and all changes (insert/update/delete) to the work item list as long as the worker is active in the pool.
The concrete case that I’m thinking about is searching a bunch of text files with a bunch of searches in parallel. For example in that case the work items would be text files and the workers would be the different searches.
Searches and files might be added/removed over time and the system would still be efficient in both limiting the amount of work scheduled at once, and in not needing to redo search work for existing searches when just a few files are changed.
I’m interested in any thoughts or suggestions. I’m looking for existing examples (any language) that might solve a similar problem. Or suggestions on how to (or not too) build my own solution.
I’ve been working on a solution using @BurntSushi 's ignore crate for the initial processing. And then watching the filesystem for changes and incrementally updating the result based on those changes… but I think I need a better high level design.
I know little about such things, but it seems like the database design of a log events and then materialized views might be pretty clean solution. In this case the events modeling inserted/updated/deleted work items and the materialized views representing the workers. But getting that overall design into rust, and in parallel is still non trivial for me.