Why the turbofish for _m_slli_epi64?

Hey everonye!

I am currently working on a project for which I need some architecture specific instructions. Among them are instructions like _m_slli_epi64 and _mm_srli_epi64, or bit shift left and bit shift right.
Both instructions are called in C with two parameters, the target __m128i we want to shift and the amount of bits by which we want to shift the target. Therefore a call to e.g. _m_slli_epi64 looks like this in C _m_slli_epi64(x, 1). Rust however defines this instruction as _m_slli_epi64<const IMM8: i32>(a: __m128i) -> __m128i, where a is the target we want to shift and the type parameter is the amount of bits by which we want to shift, at least that's what I think. So if I wanted to use this instruction in Rust I would have to write _mm_slli_epi64::<1>(x).

I have two questions about this, maybe only one of which can be answered:

  1. Why was it done like this? To me this is completely counter intuitive to the official intel intrinsics guide.
  2. What exactly is the point of <const IMM8: i32>? I have obviously seen type parameters before but never like this. Where would something like this be used? I don't see the advantages over just specifying another function argument.

Disclaimer: I do not know the original reasoning of using const generic here.

  1. If you look into the src of _m_slli_epi64, it does a static check of IMM8, then calls the "actual" instruction with 2 parameters like you would expect. So I guess the const generic is there to check parameter validity at compile time.

  2. Const generic can be used to generic over fixed length array. For example:

fn foo<const LEN: usize>(arr: [u8; LEN]) {
   // ...
}

1 Like

Have you tried using it as _mm_slli_epi64(x, 1)? If so, what happens?

These functions are marked with the internal #[rustc_legacy_const_generics(1)] attribute. This causes it to be callable as both _mm_slli_epi64(x, 1) and _mm_slli_epi64::<1>(x). The former form was necessary as const generics didn't exist yet at the time these intrinsics were stabilized and matches how C would write them. The later form got introduced I believe to remove some hacks as the internal intrinsics require these values to be constant and depending on const propagation is somewhat fragile and didn't work with codegen backends other than LLVM.

Compiler error, because the function only takes one argument.

But it is not callable as _mm_slli_epi64(x, 1), because the compiler will yell at you that the function only takes 1 argument.

Ahh you are right, looking at the source makes much more sense.
And the part about const generic is actually pretty interesting and useful, thank you for the explanation!

Does it work in the 2015 edition? It may have been removed in the 2018 or 2021 edition.

That might be possible. One could assume that the older version has been removed once const generics were stable.

I think you need to show your code. Because calling it like _mm_slli_epi64(x, c) works on Rust 2015, 2018 and 2021: Rust Playground

By the way, I'm curious what's the difference between the function in question and _mm_sll_epi64 in core::arch::x86_64 - Rust, aside from the fact that the latter doesn't use const generics? Can't check linked Intel documentation right now (blocked even through VPN - probably looks for browser settings or something like that), and Rust documentation shows no semantic difference.

upd: Looked from another machine - seems that the difference is only in input type for shift count: i32 for *slli* and __m128i for *sll*. It makes sense that the latter isn't const generic, then.

I have this code for constant time doubling in the Galois Field 2^128:

/// Constant time doubling in GF(2^128)
unsafe fn gf128_mul2(x: __m128i) -> __m128i {
    let redpoly = _mm_set_epi64x(0, 0x87); // Set our irreducible polynomial by which to reduce polynomial multiplication over GF(2)
    let zero = _mm_setzero_si128(); // Set vector of all zeros
    let mut mask = _mm_cmpgt_epi32(zero, x); // Prep mask for branchless computation
    mask = _mm_shuffle_epi32::<0xff>(mask);

    let x2 = _mm_or_si128(
        // Bitwise OR between
        _mm_slli_epi64::<1>(x), // x shift left by 1 (equals multiplication by 2)
        _mm_srli_epi64::<63>(_mm_slli_si128::<8>(x)), // and x shift left by 8 and shift right by 63, because of carry
    );

    _mm_xor_si128(x2, _mm_and_si128(redpoly, mask)) // Return bitwise XOR of x2 with the bitwise AND between the irreducible polynomial and mask
}

And if I were to replace _mm_slli_epi64::<1>(x) with _mm_slli_epi64(x, 1) rust-analyzer tells me that this function only takes 1 argument.

OK, but that's not what you said. :slight_smile: rust-analyzer isn't the compiler and your code without turbofish compiles just fine: Rust Playground

If rust-analyzer is warning you about it, then maybe it has a bug. Although, rust-analyzer doesn't yell at me about it. It accepts your code (without turbofish) just fine. I have a relatively up-to-date rust-analyzer:

$ rust-analyzer --version
rust-analyzer 2022-05-02
1 Like

Ugh yeah you are right :man_facepalming:
Guess I relied to hard on rust-analyzer there but it actually does compile and run.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.