rayon::ThreadPool overhead

We often find ourselves in the situation that we have to process a batch of data obtained through a channel; and the batch can vary a lot in size. If the batch is small rayon::ThreadPool adds more overhead than speed-up. IndexedParallelIterator::with_min_len usually is not effective enough either.

So the code tends to look like

if batch.len() < SWITCH_THERSHOLD {
   // process the batch single threaded
} else {
   // use thread-pool

This approach has several issues including the fact that the switching threshold can be very machine dependent.

Are there any better paradigms or rayon knobs useful here that we are missing?

Thank you,

What behavior are you getting from with_min_len? One option you can consider is to use rayon::join to split tasks manually, not splitting jobs smaller than your thresholds.

You could probably come up with some method to update the threshold by timing previous batches. For example, every time you process a batch whose size is between 0.5*threshold and threshold, measure the time and double the threshold if the duration is smaller than one fourth the desired threshold duration. Similarly, if a single-threaded operation takes more than twice the desired duration, halve the threshold. (You can make these updates with a compare-and-swap on an atomic integer storing the threshold. This method probably requires using rayon::join rather than parallel iterators.)

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.