How to best assign value to variable in concurrent mode

What is the best practice of assigning value to a variable in concurrent mode?
Currently I used Arc/Mutex, but found it cause many lock, leading to performance loss (or maybe it was just because I didn't use it right).
I mainly want to achieve something like this:

 use std::{
     sync::{Arc, Mutex},
     thread,
     time::Duration,
 };

 fn main() {
     let (string01, string02, string03) = ("".to_owned(), "".to_owned(), "".to_owned());

     let v = (0..3)
         .map(|i| {
             thread::sleep(Duration::from_nanos(1));
             thread::spawn(move || match i {
                 0 => {
                     string01 = process01("hello01");
                 }
                 1 => {
                     string02 = process02("hello02");
                 }
                 2 => {
                     string03 = process03("hello03");
                 }
                 _ => {}
             })
         })
         .collect::<Vec<_>>();
     for i in v {
         i.join().unwrap();
     }
 }

 fn process01(str: &str) -> String {
     println!("hello from 01");
     str.to_owned()
 }

 fn process02(str: &str) -> String {
     println!("hello from 02");
     str.to_owned()
 }

 fn process03(str: &str) -> String {
     println!("hello from 03");
     str.to_owned()
 }```

Generally if you wish to use a reference to a variable outside the thread you're spawning, you have to prove two things to the compiler:

  1. The mutable reference will last longer than the thread will.
  2. No other thread has access to this mutable pointer.

In order to prove the first thing, you have to use a scoped thread. Simply calling join on the thread handle is not enough. Scoped threads are available from crossbeam and rayon.

In your example code, you actually violate the second constraint. All three threads make use of a closure which has references to all three variables. To fix this you will need to ensure that only one of the references are moved into the thread closure.

It could look something like this.

Alice, thank you for your help. But another thing is:
The time taken for single thread, let's say, is 79ms. Then if it spawns 10 threads, the time consumed will be 170ms+. Is this normal? I expect the total time to be around 79ms though.

Spawning and using threads have a certain cost. To speed up a program with threads, they have to save you more time than this cost. And that's in the case your task doesn't have another bottleneck.

After a little testing, it seemed the cause might be :
01- the more threads used , the more cost it will take.
02- the more threads used, the bigger possibility that an api delay will occur. And since the scope need to wait for all threads to finish, thus the result.
03- both 01 & 02.

Yes. Creating a thread is a rather expensive operation and when one thread is waiting for other threads, that also costs time. The advantage of threads is that you sometimes can split up large tasks into several smaller tasks, but this only works when the task is large enough compared to the cost of spawning the threads.

Alternatively, there are other synchronization primitives that could be applicable to your case here, though they are pretty niche (and you should really think if you can improve the design to not need any kind of synchronization instead of going for these, but sometimes they come handy).

https://docs.rs/once_cell/1.2.0/once_cell/sync/struct.OnceCell.html
https://docs.rs/atomic_refcell/0.1.4/atomic_refcell/struct.AtomicRefCell.html
https://docs.rs/arc-swap/0.4.3/arc_swap/struct.ArcSwapAny.html