Confounded by simple thread example

In the code below I start 10 threads. Each one prints that it started, sleeps, and prints that it finished. The sleep time is 10 seconds for the first thread, 9 for the second, and so on. I expected all the "started" messages to be output together, followed by all the "finished" messages in reverse order. But instead I don't see the "started" message for the second thread until after the "finished" message for the first thread. Why is that? I'm assuming that when the main thread calls join on the first thread, that only blocks the main thread, not the other threads.

use std::thread;
use std::time::Duration;

const N: i32 = 10;

fn main() {
    let join_handles = (0..N).map(|i| {
        thread::spawn(move || {
            println!("started {}", i);
            // Initial threads sleep longer.
            let ms = ((N - i) * 1000) as u64;
            thread::sleep(Duration::from_millis(ms));
            println!("finished {}", i);
            i * 2
        })
    });

    for handle in join_handles {
        match handle.join() {
            Ok(result) => println!("result = {}", result),
            Err(e) => eprintln!("error: {:?}", e),
        }
    }

    println!("done");
}

.map(…) is a "lazy" adapter: it just wraps an already existing iterator with the closure, only performing the map on demand / when asked. An unsugared version of your code would be:

    let spawn_thread = |i| {
        thread::spawn(move || {
            …
        })
    };
    // nothing has happened yet, we have just defined that function.

    for i in 0 .. N { let handle = spawn_thread(i);
        match handle.join() {
            …
        }
    }

where you can see you are .join()ing on each spawned thread before spawning any other thread, thus having no parallelism whatsoever :upside_down_face:

The solution is to avoid the lazyness here; to eagerly force the application of this .map(…) function on the elems. This is achieved by, for instance, "pre-caching" the elements of the iterator / forcing a collection of all these items:

  use std::thread;
  use std::time::Duration;
  
  const N: i32 = 10;
  
  fn main() {
      let join_handles = (0..N).map(|i| {
          thread::spawn(move || {
              println!("started {}", i);
              // Initial threads sleep longer.
              let ms = ((N - i) * 1000) as u64;
              thread::sleep(Duration::from_millis(ms));
              println!("finished {}", i);
              i * 2
          })
-     });
+     }).collect::<Vec<_>>();
  
      for handle in join_handles {
          match handle.join() {
              Ok(result) => println!("result = {}", result),
              Err(e) => eprintln!("error: {:?}", e),
          }
      }
  
      println!("done");
  }
7 Likes

I wonder if this could be checked for in a clippy lint? (it feels too "fancy" to be something in rustc proper, to me.)

There is a lint for straight up not consuming the iterator -- for example:

const N: i32 = 10;

fn main() {
    (0..N).map(|i| {
        println!("processing item {}", i);
        i * 2
    });

    println!("done");
}

Results in:

warning: unused `Map` that must be used
 --> src/main.rs:4:5
  |
4 | /     (0..N).map(|i| {
5 | |         println!("processing item {}", i);
6 | |         i * 2
7 | |     });
  | |_______^
  |
  = note: `#[warn(unused_must_use)]` on by default
  = note: iterators are lazy and do nothing unless consumed

It seems like a potentially challenging thing to lint for, though. Perhaps obvious cases, like the OP, could be caught.

1 Like

Yeah, this would basically only be for Iterator<Item = Thread>, I think. The fact that iterator's don't make everything first then map them is one of the things that makes them great, most of the time.

You’d have to restrict this further to Iterator<Item = Thread> that are processed in for loops that eagerly call join. Otherwise, code like this

const N: i32 = 10;

fn main() {
    let join_handles_iterator = (0..N).map(|i| {
        thread::spawn(move || {
            /* ... */
        })
    });
    let join_handles = join_handles_iterator.collect();
    for handle in join_handles {
        match handle.join() {
            /* ... */
        }
    }
}

or even

const N: i32 = 10;

fn main() {
    let mut join_handles = vec![];
    // perhaps push some more special threads to join_handles already
    let join_handles_iterator = (0..N).map(|i| {
        thread::spawn(move || {
            /* ... */
        })
    });
    for handle in join_handles_iterator {
        join_handles.push(handle);
    }
    for handle in join_handles {
        match handle.join() {
            /* ... */
        }
    }
}

would be warned against without any reason.


Basically, I don’t thing there’s any good way to add warnings here. Also it should be hard to miss if you, by accident, spawn your threads serially instead of in parallel; at least if it matters. And otherwise (when you’re not noticing it) it’s probably only a performance problem.

3 Likes