I have set up the following benchmark:
pub fn criterion_benchmark(c: &mut Criterion) {
const NUM_INPUTS: usize = 64;
const NUM_HIDDEN_0: usize = 1;
const NUM_OUTPUTS: usize = 128;
let mut vertices = vec![
vec![0.0; NUM_INPUTS].into_boxed_slice(),
vec![0.0; NUM_INPUTS + NUM_HIDDEN_0].into_boxed_slice(),
vec![0.0; NUM_INPUTS + NUM_HIDDEN_0 + NUM_OUTPUTS].into_boxed_slice(),
]
.into_boxed_slice();
let edges = vec![
vec![0.0; NUM_INPUTS * NUM_HIDDEN_0].into_boxed_slice(),
vec![0.0; (NUM_INPUTS + NUM_HIDDEN_0) * NUM_OUTPUTS].into_boxed_slice(),
]
.into_boxed_slice();
c.bench_function("feed_forward", |b| {
b.iter(|| feed_forward(&mut vertices, &edges))
});
}
This is the relevant function definition:
pub fn feed_forward(vertices: &mut [Box<[f32]>], edges: &[Box<[f32]>]) { [...] }
The issue I have is this code executed by the feed_forward
function:
lgc_f32x8(f32x8::from_slice_unaligned(scalar)).write_to_slice_unaligned(scalar);
I'd much rather call f32x8::from_slice_aligned
and f32x8::write_to_slice_aligned
, instead. For that purpose my slice needs to be aligned to at least 32 bits bytes.
How would I go about that? Do I need to mess around with std::alloc
or is there another solution, either part of the standard or a third party library?