It doesn't seem to be supported by LLVM if this open issue is still up to date.
Besides that:
-C target-feature=native is not a valid (the compiler is even throwing a warning at you!). The correct one would be -C target-cpu=native. But still, don't use -C target-cpu=native on Godbolt, it will produce results that depend on exactly which machine is being used by Godbolt. This is particularly important for you given that AVX-512 is not supported by all CPUs, so you might hit a machine that doesn't support it and get different results semi-randomly. Prefer instead to enable exactly the features you care about, e.g. with -C target-feature=+avx512f,+avx512vl
Your code also has a typo:
let x4 = k1.get_unchecked(4);
if k1[1] {
The second line should be if k1[4] {.
I'm also not sure why you're using unchecked indexing into k1 just to use checked indexing for the value value right after.
I fear this is the correct answer, even if it's not an actual solution. I'll keep topic open for a while to see if anyone else has an unobvious "hack".
Hopefully this has been addressed in new godbolt link.
I want to write a SIMD friendly parser on stable Rust. So no portable simd (keeping dependencies as light as possible).
To achieve this I have implemented parser as a trait (e.g. Stage1Scanner), that's implemented per architecture/feature (e.g. Avx512Scanner, NeonScanner, etc.). I do this because different arch have different optimal lane sizes and registers.
However I need a fallback that's generally as good as it gets. Ideally something that auto-vectorizes to optimal code so I keep more in default trait implementation. Less platform specific code, the less I need to write.