Hi,
I'm trying to use the portable_simd feature for a FIR filter. I've made a test that sums up the resulting 4096 f32's and the result is no longer exactly the same as the serial case. These are really small values summed up, so maybe that would be expected since the order of the summation is changed.
The filter essentially computes the dot product between a slice of coefficients and a series of values. The slices are always the same length, values are zero-padded when initialized.
I'm unfamiliar with SIMD and was wondering about some things:
- is the partitioning into prefix, middle and suffix always the same for two slices of equal length? if so, that may be missing from the docs.
- is the partitioning determined at compile-time or run-time?
Below is the code, with the original code commented out. Please also let me know if you spot anything else weird that I might have misunderstood. It's slightly complicated by the use of a heapless::Dequeue, since this is an embedded device accumulating samples:
fn value(&self) -> f32 {
// Convolve filter with samples.
// self.samples
// .iter()
// .zip(&COEFFS)
// .fold(0.0, |a, (s, c)| a + (s * c))
// let (_, coeffs, _) = COEFFS.as_simd();
// debug_assert_eq!(coeffs.len() * 4, COEFFS.len());
let (f, b) = self.samples.as_slices();
let (cf, cb) = COEFFS.split_at(f.len());
assert_eq!(f.len(), cf.len());
assert_eq!(b.len(), cb.len());
// First half of dequeue
let (p, m, s) = f.as_simd::<4>();
let (cp, cm, cs) = cf.as_simd::<4>();
let sp = p.iter().zip(cp).fold(0.0, |a, (s, c)| a + (s * c));
let ss = s.iter().zip(cs).fold(0.0, |a, (s, c)| a + (s * c));
let fsums = f32x4::from_array([sp, 0.0, 0.0, ss]);
let fsums = m.iter().zip(cm).fold(fsums, |a, (s, c)| a + (s * c));
// Second half of dequeue
let (p, m, s) = b.as_simd::<4>();
let (cp, cm, cs) = cb.as_simd::<4>();
let sp = p.iter().zip(cp).fold(0.0, |a, (s, c)| a + (s * c));
let ss = s.iter().zip(cs).fold(0.0, |a, (s, c)| a + (s * c));
let bsums = f32x4::from_array([sp, 0.0, 0.0, ss]);
let bsums = m.iter().zip(cm).fold(bsums, |a, (s, c)| a + (s * c));
(fsums + bsums).reduce_sum()
}