I'm writing some FFT code using core::arch SIMD instrinsics.
I'm writing both an in-place API, where the input array is overwritten with the results, and an out-of-place API, where the results are stored in a separate buffer.
This is leading to a lot of code duplication though (both in source files, and in the final binary) , and I'm hoping I can avoid some of the duplication by doing the following:
#[target_feature(enable = "avx", enable = "fma")]
unsafe fn perform_fft_f32(input: *const Complex<f32>, output: *mut Complex<f32>) {
let input0 = _mm256_loadu_ps(input as *const f32);
let input0 = _mm256_loadu_ps(input.add(8) as *const f32);
// more loads
// FFT computation on AVX registers
_mm256_storeu_ps(output as *mut f32);
_mm256_storeu_ps(output.add(8) as *mut f32);
//more stores
}
#[inline(always)]
unsafe fn perform_fft_inplace_f32(buffer: &mut [Complex<f32>]) {
perform_fft_f32(buffer.as_ptr(), buffer.as_mut_ptr());
}
#[inline(always)]
unsafe fn perform_fft_out_of_place_f32(input: &mut [Complex<f32>], output: &mut [Complex<f32>]) {
perform_fft_f32(input.as_ptr(), output.as_mut_ptr());
}
In perform_fft_inplace_f32
I'm creating multiple copies of the same raw pointer and passing them in as separate variables. Is this UB? If so, is there a better way to do what I'm trying to do? (Ie, avoid code+binary duplication for code that is functionally identical besides the write destination)