Epoll wait sometimes misses signalfd signals

I'm coding a simple app that has two threads; both use signalfd and epoll to wait on input available or SIGINT. Occasionally (irritatingly, only about 1 time in 10) one of the threads will fail to return from epoll::wait when a SIGINT is sent to the app. On the times when it does work properly, the secondary thread appears to be returning before the main thread, so perhaps it depends on the order that the signal is delivered to the threads? I've tried adding yield_now or sleep calls at strategic places, all to no effect. Has anyone else seen this behaviour? Any guesses as to what I might be doing wrong, or hints for how to debug further, will be gratefully received.

[Later edit]
Hmmm, after searching for info on signals, I find some people saying that all the threads get to handle the signal, but elsewhere that "an arbitrary (single) thread is chosen by the kernel to receive the signal". That being the case, the code should fail always, not just sometimes. So I'm still puzzled.

I've been working with Linux select via the nix crate, i.e., I'm not directly using epoll. I see the same behavior. My understanding from reading the man pages is that there is no guarantee that an event to an fd will be reported on any particular poll.

From the man pages for epoll:
Q: Can two epoll instances wait for the same file descriptor? If so, are events reported to both epoll file descriptors?
A: Yes, and events would be reported to both. However, careful programming may be needed to do this correctly.
They don't specify what is meant by "careful programming".

To investigate further, I wrote a test script, which runs the app, sleeps for a random time (10 to 990 msec), kills the app with a SIGINT, waits another 5 msec, and checks whether it is still running. Under this script, the app behaves perfectly every time. However, running it directly from the keyboard still gives random failures! So, seems it may be something to do with the way Bash delivers SIGINT on Ctrl-C being typed by the user? Very strange.

Anyway, (probably) not a Rust epoll issue, so I should find a better forum to ask on.

Sounds like a large deficiency to me, to the point that it is kind of too unreliable to be useful.

How did you work around this behavior?

bash does not deliver SIGINT. Rather, SIGINT is sent by the kernel's terminal device driver to the processes in the terminal's foreground process group. bash merely places the process group in the foreground (via tcsetpgrp()) as per the user's wishes, but it is not involved in the actual signal delivery.

The fd may not be reported as ready on the next epoll after the signal, but it will be reported as ready on a future epoll. I think that's the best you can do in an asynchronous system.

I think I misunderstood you, as my understanding was that a signal wasn't guaranteed to be polled at all.
If the signal shows up at a future poll then there there's no real problem.

It's my bad. I realized that my original explanation was missing the key piece as soon as I saw your reply.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.