I want to kill a stuck thread

I have a couple of threads in my program and one of them waits on messages from a NATS server. Then passes the received data on to a serial port. I'm using the rust-nats crate: GitHub - jedisct1/rust-nats: A simple NATS client library for Rust

Under some fault situations, in that thread or elsewhere, I want to be able to shut down all my threads and start over.

My problem is that the NATS receiving thread is stuck forever in a client.wait() call so there is no way I can signal it via channel message or via run flag to quit it's loop.

It's a simple piece of code:

fn do_tx(port: &mut BoxSerial, nats_client: &mut nats::Client) -> BoxResult<()> {
    // Subscribe to some NATS subject
    let s1  = nats_client.subscribe("serialMsg", None)?;

    // Send data from NATS to serial port forever
    loop {
        let nats_event = nats_client.wait()?;
        port.write_all(&nats_event.msg)?;
    }
}

Run from a thread started in main like so:

        tx_thread = thread::spawn(move || {
            match do_tx(&mut rx_port, &mut nats_client2, &counter1) {
                Ok(_) => {....},
                Err(_e) => {.....},
            }
        });

What to do?

I'd really like to be able to kill the thread with extreme violence.

The surest way is to re-exec the process entirely. You could probably fill Command with your same env::args(), and then use CommandExt::exec().

Thanks for that.

Unfortunately after putting exec() in place I find it is not enough extreme violence.

If I force an error into my system by yanking out the USB/Serial adapter it is using sure enough it execs() itself again. And again after some delay, and again...

But when plugging the serial adapter back in it does not recover. The adapter has been allocated a different name in /dev/. Changed from /dev/ttyUSB0 to /dev/ttyUSB1.

Seems exec() does not clean up open ports and I see no way to close the serial port from the serialport crate. The port is not closed when it goes out of scope, isn't that a bug?

Time for more violence... I'll just exit on error and let systemd start it again.

Sounds like they are missing the CLOEXEC flag, which tells the kernel to close it for you.

Not if you mean by exec(), as that doesn't run any destructors like leaving a scope would.

No, I mean I have "let mut port = serialport::open_with_settings(....) in a totally different scope in a function that is called from main.

I have the exec() in main() when it detects an error happening.

That serial port is not closed when leaving that function's scope. Which is probably just as well as it spins up a couple of threads that use it, a rx thread and a tx thread, they run forever.

That would be OK, but then the port is not released during the exec(). So replugging the serial adapter it gets a different name.

OK, looking at serialport's posix/tty.rs, it calls a raw open with a few flags, but not OFlag::O_CLOEXEC. It could be added there, although note that kernels before 2.6.23 will ignore that. You can also set the flag after opening using ioctl(fd, FIOCLEX) or fcntl(fd, F_SETFD, FD_CLOEXEC), as std does.

Thanks.

I have decided to wimp out and just exit on any non-recoverable error. Systemd can take care of getting it going again.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.