I have a little Rust program that spawns a few scoped threads that normally should run forever. In order to shutdown the threads nicely I catch SIGTERM with the ctrlc
. Which then sends a message via crossbeam channel which is tested by the treads as they loop with ctrl_c_events.is_empty().
The problem is that on rare occasions the signal causes the threads to shut down and return nicely but then either the program hangs forever with 100% CPU usage or fails with a setfault.
I do not have any unsafe
in the code I have written. There is no normal Rust panic.
I finally figured out how to get a core dump when this fails and use rust-gdb on it to get a stack trace. Which does not show much except that it ends up in somewhere in libc or libpthread. I'd like to at least get a stack trace that shows the last line of Rust code executed before diving into libc or whatever.
Here is my stack trace:
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x0000007f7bd3b974 in __GI_abort () at abort.c:79
#2 0x0000007f7bd7472c in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f7be357e0 "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3 0x0000007f7bd7aaa4 in malloc_printerr (str=str@entry=0x7f7be30ee0 "malloc_consolidate(): invalid chunk size") at malloc.c:5342
#4 0x0000007f7bd7ada8 in malloc_consolidate (av=av@entry=0x7f64000020) at malloc.c:4471
#5 0x0000007f7bd7c73c in _int_free (av=0x7f64000020, p=0x7f64002c20, have_lock=<optimized out>) at malloc.c:4392
#6 0x0000007f7be19738 in tcache_thread_shutdown () at malloc.c:2979
#7 arena_thread_freeres () at arena.c:950
#8 0x0000007f7be19874 in __libc_thread_freeres () at thread-freeres.c:29
#9 0x0000007f7bf36098 in start_thread (arg=0x7fe665d0cf) at pthread_create.c:476
#10 0x0000007f7bdd80cc in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78
My whole code is a bit big but basically I run threads like so:
let (t1, t2, t3) = thread::scope(|scope| {
let t1 = scope.spawn(|| {
sig_status_task(&session, ctrl_c_events.clone(), nats_connection.clone())
});
let t2 = scope.spawn(|| {
det_status_task(
&session,
ctrl_c_events.clone(),
nats_connection.clone(),
)
});
let t3 = scope.spawn(|| {
control_task(&session, &args.pin1, &args.pin2, &ctrl_c_events)
});
// Get results of all threads and return
(t1.join().unwrap(), t2.join().unwrap(), t3.join().unwrap())
});
info!(
"Tasks ended with:\n t1: {:?}\n t2: {:?}\n t3: {:?}\n",
t1, t2, t3
);
Signals are caught and messaged to threads like so:
fn ctrl_channel() -> Receiver<()> {
const SIGNALS: &[c_int] = &[SIGTERM, SIGQUIT, SIGINT, SIGTSTP, SIGHUP, SIGCHLD, SIGCONT];
let mut sigs = match Signals::new(SIGNALS) {
Ok(sigs) => sigs,
Err(e) => {
error!("creating Signals: {e}");
std::process::exit(0);
}
};
let (sender, receiver) = bounded(100);
spawn(move || loop {
if let Some(signal) = sigs.into_iter().next() {
info!("Got signal {}", signal);
if let Err(e) = sender.send(()) {
error!("Signal handler thread failed {}", e);
std::process::exit(0);
}
}
});
receiver
}
The thread functions get those messages and shut down nicely like so:
loop {
// Lots of stuff...
if !ctrl_c_events.is_empty() {
warn!("Got Signal.");
// Clean up nicely here...
std::thread::sleep(Duration::from_secs(1));
info!("waking and returning ...");
return Err(e.into());
}
}
My log messages show that all treads make it to that info!"waking and returning...")
output but never make it through the following return
Any advice would be appreciated.