Using rayon for parallel tasks

I'm trying to figure out how much resources (in terms of threads and space) this code below consumes. I initiated parallel tasks using par_split then another in every task. Please advice me if something bad with code, and consider when input is huge

\\ --snip--
let input = "0,1;2,7;4,13;6,19;8,25";
let dataset = input
    .par_split(';')
    .map(
      |i| ("1,".to_owned()+i).par_split(',')
        .map(|j| j.parse::<f64>().unwrap())
        .collect::<Vec<_>>()
    )
    .flatten()
    .collect::<Vec<_>>();
\\ --snip--

Rayon will set up a global thread pool, a queue for tasks, and the parallel iterators will create linked lists or trees of partially completed work, which will be later appended into a vec, which may get reallocated and copied several times.

In this example rayon's bookkeeping is probably doing 100 times more work than parsing of this short list. Unless the list is megabytes long, there's probably no point trying to use threads to parse it, whether via rayon or anything else.

That said, how should I set up a custom thread pool?. So the capacity of thread pools be a moving value when input gets larger

For a start -- parallel or not -- stop allocating within your iterators. (Don't collect the final Vec either if iteration is sufficient.)

1 Like

Great! The first solution looks better. No more calls to flatten

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.