Getting rid of a call to mem::transmute

Hello everyone,

I am currently working on an optimized implementation of the COLM AEAD cipher in Rust. I have a correctly working version here and it has already proven to be quite fast (~2.8 - 2.6 cpb for 1KB - 16KB buffers), though the goal is to reach <2cpb.
The code is working but definitely needs refactoring before it can be released to the public, which is what I am working on right now. Creating a safe interface, removing unecessary operations or rewriting bottlenecks etc. However, one thing I just can't get rid off and which is bugging me the most is the following:

COLM depends on its underlying block cipher (in my current case AES128) for the encryption of data blocks. For every block of data COLM has to do two calls to the block cipher encryption. Currently I am using the AES128 implementation from the RustCrypto block ciphers repository, which is well optimized and yields the speed I need for an optimized COLM implementaiton.
Now the problem is, that the call to encrypt a single block with this AES implementation expects a GenericArray<u8, 16> as input, but since I am implementing COLM on a low level my blocks are represented as __m128i leading to a very ugly conversion hack using transmute:

use std::mem;

#[inline]
unsafe fn aes_encrypt(_in: __m128i, cipher: &Aes128Enc) -> __m128i {
    let tmp = byte_swap(_in);
    cipher.encrypt_block(mem::transmute(&tmp as *const __m128i));
    byte_swap(tmp)
}

First of all it prevents me from using #![no_std]. Second of all its using transmute, which is well... not really nice to see in a crypto crate.

So to get to the point, can anybody think of a solution to this without me having to create a custom AES implementation?

Thank you all in advance, I am really struggling with this atm.

Edit:
Also if anybody has experience in terms of implementing a safe API over an unsafe foundation or has some resources to point me to, I would really appreciate help in that regard as well.

Here you go :blush:

Note that I've just started writing unsafe code, so the code could be unsound. But if that were the case, the transmute version would likely be unsound as well. However, I think the cast is correct and that the function is safe to call. There should be no inputs which could lead to memory unsafety.

This earlier thread of mine about casting to GenericArray might be of interest to you.

Edit:
Here is an argument for why the cast is correct:

use std::arch::x86_64::{__m128i, _mm_setzero_si128};
use bytemuck; // 1.9.1

fn cast_block(block: &mut __m128i) -> &mut [u8; 16] {
    bytemuck::cast_mut(block)
}

fn main() {
    let mut simd = unsafe { _mm_setzero_si128() };
    let arr = cast_block(&mut simd);
    dbg!(arr);
}

Using bytemuck we can cast a &mut __m128i to a &mut [u8;16] as both types are Pod. Since a GenericArray<u8, U16> is identical to a [u8; 16] casting to it should be safe. It would be nice if GenericArray implemented Pod, as then no unsafe would be needed for code like this. I think I might create an issue for that.

1 Like

Oh wow this really looks like magic at first glance but it solves the problem. :joy:
Thank you very much for the code and the explaination to why its sound!
Working on such a low level program has already taught me so much more about Rust than all projects I ever attempted before. :sweat_smile:

If you are okay with using nightly features (portable_simd for access to core::simd::*), you can do this completely without touching unsafe.

use core::simd::u8x16;

#[inline]
fn aes_encrypt(mut _in: __m128i, cipher: &Aes128Enc) -> __m128i {
    let mut bytes = u8x16::from(_in);
    cipher.encrypt_block(bytes.as_mut_array().into());
    bytes.into()
}

(playground)

This works because core::simd::u8x16 implements From<__m128i>, and it has a as_mut_array() method which returns &mut [u8; 16]. From there you can use into() to convert the &mut [u8; 16] to your GenericArray, and then convert the encrypted block back to a __m128i so it can be returned.

Otherwise, I would highly suggest not transmuting references (or pointer casting like the original solution, which is effectively the same thing) because it's easy to mess up the size/alignment of the pointee. As their edit suggests, the bytemuck crate is the best way to do these sorts of transmutes safely.

2 Likes

I'll take a look at it! Since I am on nightly compiler anyways it's not a big deal. Thanks for pointing it out.