[Solved] Trying to understand multithreading correctly


#1

Hello,
I’m trying to develop a small application that reads a directory and then does some processing for each file. I’d thought on having a number of threads doing that job, and for that, I’ve thought on creating a Vec with all the files in the directory and then each thread would pop one file and analyze it. When finished, it would analyze another one. All the results should go to a shared struct. I’ve managed to create this code (simplified here for debugging purposes):
https://is.gd/TRMC5Q

And I get a lifetime issue. The thing is that I would like the unit analysis to only receive a Mutex, not an Arc, but that gives type errors: seems I cannot get a Mutex<Result> from a &mut Result easily.

How can I fix it?


#2

A big part of the problem is that threads are not guaranteed to not outlive the whole program, so any data that’s transfered into them has to be 'static (i.e. no non-'static references allowed). There are, however, implementations of scoped threads (such as crossbeam) that works almost like a normal scope and blocks until all of the contained threads are done. In addition to this, there are some implementations of scoped thread pools that can take and return iterators. I think something in those areas may be worth looking at, in your case.


#3

And I get a lifetime issue. The thing is that I would like the unit analysis to only receive a Mutex, not an Arc, but that gives type errors: seems I cannot get a Mutex from a &mut Result easily.

That is correct. In rust, unlike many other languages, the mutex is the owner of data. You can’t really take some reference, and then say “well, now this reference is protected by this mutex”. You have to protect the data with the mutex from the start. So, in your example you probably want

fn threaded_analysis(config: &Config) -> Results {
    let results = Arc::new(Mutex::new(Results::new()));

    ....
    
    for t in handles {
        t.join().unwrap();
    }
    results.lock().unwrap().clone()
}

However I would try to avoid mutex here. Looks like each file can be processed completely in parallel, so I would use something like rayon or simple_parallel to map each file to sub result in parallel. After I have the result for each file, I can combine them in the main thread. And with rayon, you will be able to perform even the reduction step in parallel.


#4

I finally decided to use a local vector for results and extend the complete results at the end of the analysis. Seems to work now :slight_smile: thanks to both of you!