An overview of my work on SIMD in Rust.
Good fun! I’d love to see the cross-platform layer made truly portable by means of a scalar implementation that is selected if there is no SIMD option. That would enable “worry-free” use of simd types in data structures, which sounds highly appealing to me for cases where they make a particularly good fit.
Having a cross-platform library that calls the appropriate instruction on ARM/x86/… or fall-backing to a scalar version would be great. I don’t think I want that to be part of
simd (at least, not yet), but it should be possible to do in a zero-overhead way in a wrapper lib.
Which instruction sets are supported? (does it start at SSE2 or way back at MMX?)
At the moment there’s no MMX intrinsics, but I’m fairly sure it is just a matter of adding them.
MMX in particular is is bit complicated because we would need to expose the LLVM type
x86_mmx to Rust code somehow. Also, emms is a complete mess: LLVM can’t automatically insert emms calls, so you have to insert them yourself. But if you insert them yourself, you have to be very careful because the LLVM optimizer doesn’t understand the relationship between MMX operations, x87 operations, and emms, and therefore can reorder instructions incorrectly. Overall, it’s probably not worth the trouble.
Whoever you are @eefriedman, you were right. MMX support has been a road full of pain for all the reasons you mention.
the LLVM optimizer doesn’t understand the relationship between MMX operations, x87 operations, and emms, and therefore can reorder instructions incorrectly.
For a hammer, one can use
asm!("emms" : : : "volatile") to work around that. I just added  support for
core::arch and there I use
llvm.x86.mmx.emms. We’ll see how that works out. In the mean time I’ve filled https://github.com/rust-lang/rust/issues/57831 to replicate ICC’s warning about missing
emms and https://github.com/rust-lang/rust/issues/57832 to clean up the current implementaiton of
x86_mmx in rustc.
 I’ve been wondering for a year why our tests were randomly failing and fixing themselves just to fail again and why these errors were pretty much impossible to reproduce in different machines and missing
emms is probably a big source of issues.