Not quite a semaphore, but the the atomic-wait crate provides a futex API where you can wait/wake on just an AtomicU32 rather than a mutex/condvar.
semka seems to probably be the best bet for using the platform semaphore. I found this via a search on libs.rs, which has a ranking algorithm that makes me feel relatively okay picking this one over the various other many-year-old options.
If you're using async, there's a bunch of async semaphores available, e.g. from async_lock (smol/async-std) or from Tokio.
It would be nice for Tokio to also have the non async version of their code for such features.
Because it really seems trustworthy enough to be relied on, on the long run.
We have structures that doesn't need async because everything fits in memory and we need max throughput.