If you have direct access to a Mac that cargo test fails on, I would run it under LLDB and get a backtrace when the SIGABRT signal is raised.
The "failed to initiate panic" means some code triggered a panic and for whatever reason, the __rust_start_panic() function didn't unwind the stack.
If you haven't already, try setting the RUST_BACKTRACE environment variable to 1. The code might run long enough to call the default panic hook (which prints a backtrace to stderr) before starting the unwinding process.
Is this all pure Rust code, or could there be something non-standard going on (modifying the panic machinery, Rust is calling C which calls back into Rust, wierd hardware/libc, etc.)? I don't think I've ever seen the "failed to initiate panic" before.
The way I find things I can not find is to only look at half of the problem.
Maybe you can bisect? Run half the tests and see if the failure still happens. Repeat halving tests untill you narrow it down to one test.
It's worse than that; while the test fails consistently on Github's macos-latest runner, it runs fine on my borrowed macbook. I'm not relishing the prospect of running a test-bisection on CI, but what can you do?
This happens when running the full CI suite, or all the unit tests (either with cargo test --lib or cargo test -- ... with all the unit tests named. Running unit tests individually or in groups of half the set both pass.
The bisected run is especially confusing; the suite failing, but individual tests passing leads me to the hypothesis that there's a pair of tests that are somehow interacting. But the bisection runs all 6 pairs of two quarters of the suite (c.f. nix/ci/default.nix), so every pair should be running together there.
I'm starting to be inclined to configure a Mac test run that runs everything but unit tests, and then runs the units individually, shameful that that solution would be.
I don't believe there's any FFI anywhere. I think I've narrowed down the cause to here:
That seems like a fairly straightforward bit of code, so why it would trigger this weird panic handler issue, but only in GHA macos-latest is unclear to me.
I think I see the source of your question though: if a panic is crossing FFI boundaries, that might cause this? I've seen reference elsewhere to using catch_unwind to protect against this issue.
There is a chance that the Mac version of bash behaves badly with those flags. It depends on the specific version of bash installed on the CI bash.
I remember an issue like this on a different project.
If this is the case, then it's possible that the "borrowed" MacBook had a custom bash installed on it that fixed the bug. The default /bin/bash that is installed on MacBooks is old (mine says 3.2.57(1)-release); it's common to use e.g. Homebrew to install a more recent version and put that first in path (I have 5.1.16(1)-release installed on mine through Homebrew).
I wonder if the GHA runner only has the default bash; that could explain why it only failed there, and not on the one you borrowed.
The closest thing I came to a solution here was running with --threads 1 so that I could identify a failing test case, and then using a cfg_attr to exclude that test from macOS; hardly ideal.
Doubly so because running the tests on a different branch produces a panic in a different test. Iterating the process would mean slowly removing every test from macOS - basically we'd have to stop supporting that platform.