How to check for child thread state in a non-blocking way?

I'm trying to create program with the following flow:

Main thread should scan directory in the infinite loop and create a new thread for each found file. That new thread will tail (same to unix' tail -f) file in the another infinite loop and send each line back to the main thread via mpsc routines.
I'm tracking children threads in the HashMap<PathBuf, JoinHandle<()>> map to prevent multiple threads for an one file.

I have a problem with this case:

  1. File was created, main thread spawned a child for it.
  2. File was deleted, child thread had terminated itself.
  3. Main thread still thinks that child exist because it has an entry in its HashMap, and if file with the same name will be created again, new one child thread will not be spawned.

At first I thought that I can wrap replies from the children threads to the enum like this, so if child thread returns Stopped value, main thread can remove reference to it safely.

enum Reply {
    Message(String),
    Stopped,
}

But then I thought that if child thread was terminated abnormally (kill -9, for example), it will not send anything back and there will be a mess.
So I thought, maybe I can check for children threads state on the each iteration of the main thread? Something like that pseudo-code probably:

loop {
   for path in self.list_files_in_dir() {
      if !self.children.contains_key(path) {
          let handle = thread::spawn(move || {
              let watcher = Watcher::new(path);
              watcher.start();
          });
      }
   }

   for handle in self.children.values() {
      if let Ok(done) = handle.non_blocking_join() {
          // Okay, thread is done here.
          // I need to remove it's handle from the `self.children`
      }
   }

   thread::sleep(time::Duration::new(1, 0));
}

I can't use JoinHandle.join() because it will block main thread and all children threads will never stop (in a very good conditions, obviously). So, am I missing something? Maybe there is another and better way to do all that thing?

I think your idea of creating a Reply enum is a good one. If the child thread panics, the channel will return an error on rx.recv(), which tells you that 'something has gone wrong' (the thread has died), and the main thread should tidy up.

Incidentally, I don't think it is possible to kill -9 a single thread inside a process without terminating the whole process.

Yeah, but as far as I can see I can't determine thread handle from the mpsc::RecvError, so if child thread died in panic (oh, it sounds awful :slight_smile: ), there is no way to properly clean up everything.

A workaround may be to keep a flag for each child that it would set upon exiting

Glibc and musl both have pthread_tryjoin_np, but that "np" indicates this is a non-portable extension. If that's ok for you, then you could get the underlying handle with as_pthread_t and try the join yourself.

1 Like

Thank you for the note. Yet, as you said, it is not a portable function, so probably I'll just stick to a Reply enum and a hope that children threads will never panic :slight_smile:

Not sure what can happen with a thread that it'll be terminated abnormally. I guess "out of memory kill" and similar host OS jokes are too rare cases for now.