That exact expression doesn't work for the same reason (and is in fact mentioned in the OP): the shift overflows for k == N so you either have to handle that case specially, or it panics/produces an incorrect result.
The approved answer also panics in debug for k == 0 or produces an incorrect result in release for k > N, so I assumed that was fine. The check is also likely to be well-predicted, assuming that's it's normally in 1..N -- and that if it's not optimized out entirely because it's a constant.
Looks like there's an interesting LLVM behaviour here, actually. It recognizes x & ((1 << k)-1) as bzhi on its own, but not if it's guarded with an if k < u64::BITS check.