Multi-threading for loop


I am trying to use multi-threading on the code below.
I have tried to comment my code so you can understand what I am doing.

1/ The loop to do in parallel is for j in 0..n. I have tried thread::spawn but I get a lot of error that I don't understand.
2/ Is there a better (more efficient) way to automatically create chks which depend on n and len.

Sorry in advance if my questions seem dumb. I recently just started with Rust.

Thank you

use rand::Rng;
use rand::distributions::Uniform;

fn main() {
    let n = 4; // number of threads
    let len = 8; // length of vector
    // a random vector of positive integer
    let mut rng = rand::thread_rng();
    let range = Uniform::new(0, 5);
    let x: Vec<u64> = (0..len).map(|_| rng.sample(range)).collect();
    let mut counts: Vec<Vec<usize>> = vec![vec![0_usize; len]; n];// a matrix of dim len x n
    let chks = vec![[0,1],[2,3],[4,5],[6,7]];// 4 chunks for 4 threads
    for j in 0..n { // need to do in parallel
        for i in chks[j] { // each thread process its own chunk
            counts[j][x[i] as usize] += 1; // each thread writes in its own sub vector
    println!("x = {:?}", x);
    println!("counts = {:?}", counts);

I suspect rayon will help.


The errors you'll get from thread::spawn are avoidable using a thread::scope. Rust Playground I'll have to admit even then it's a bit tricky. One needs to eliminate problematic indexing, too, and also the nuances around lifetimes due to closure captures leave some (if nothing else, diagnostic) compiler improvements to be desired.

(Hard) quiz question for advanced readers: Why does &*chk make a difference in that playground as opposed to writing chk? Hint[1].

For an easier approach, and letting a library handle the chunking and a thread pool, rayon would indeed be the way to go to parallelize iteration, though to give concrete code examples, I would need to know more concretely what your end goal in terms of output of the computation is, as right not it does depend on the number of threads, and is not combined / tallied up into one single result. It is not clear to me whether the chunking is just supposed to be an implementation detail for multi-threading, or whether the n chunks are also desired for the problem at hand and only happened to have offered themselves as a good place to do multi-threading, too.

  1. also take into consideration how writing for &i in <_>::into_iter(chk) won't compile but for &i in <&_>::into_iter(chk) will ↩ī¸Ž


12 posts were merged into an existing topic: A little compilation error puzzle/challenge (can you understand what is going on?) around iterators, closure captures, borrow checking, references, and the like

Thank you @steffahn .

1/ I want to be able to control how many threads are used to run the for loop, so counts has to have his dimensions depending on the number of threads.

2/ the results in counts does not need to be combined at the end.

3/ the n chunks are for the n threads so that each thread can do its work and run without any race condition with another thread. If there is no multi-threading only one chunk is needed.

4/ the end goal is to build a multi-threading counting sort algorithm for positive integers ( I am trying to build something new) in the case where the largest number in the vector to sort is smaller than the length of the vector to sort.

5/ what you might observe is that counts is going to record the position where x[i] should be.

6/ chk depends on the length len of x and the number of threadsn. If the len=12 and n=4 the chk will have made of 4 vectors of 3 elements:
vec![[0,1,2],[3,4,5],[6,7,8],[9, 10, 11]]
Now if len=13:
vec![[0,1,2],[3,4,5],[6,7,8],[9, 10, 11,12]]
The last vector will have an additional element.
In Julia for example, this would be written like that:
[[1:3], [4:6],[7:9],[10:13]]
What will matters is to know where starting index and the ending index in each sub vector.
Each sub vector represents the index of x that each thread will process.

I hope it is clear.

Please let me know if you have other questions. Note that I am not sure where this experiment will lead me.

Hi @steffahn, I am trying to wrap my head around how thread work.
Let's assume I change the vector counts to a simple vector instead of a vector of vectors. Like the below.

use rand::Rng;

fn make_chunk(len: usize, n: usize) -> Vec<(usize, usize)> {
    let m: usize = len / n;
    let mut chks = ((0..len).step_by(m).zip((0..len).skip(m-1).step_by(m))).collect::<Vec<_>>();
    if chks[n-1].1 != (len-1) {
        chks[n-1].1 = len -1;

fn main() {
    let n: usize = 4;
    let len: usize = 50;
    let mut rng = rand::thread_rng();
    let range = rand::distributions::Uniform::new(0, 2);
    let x: Vec<u64> = (0..len).map(|_| rng.sample(range)).collect();
    let chks: Vec<(usize, usize)> = make_chunk(len, n);

    let mut counts: Vec<usize> = vec![0_usize; len * n];
    for j in 0..n {
        for i in (chks[j].0)..=(chks[j].1) {
            counts[x[i] as usize + j * len] += 1;

Then how do I deal with counts in that case?
Thank you

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.