I'm working on some digital signal processing code for a SDR and to speed things up I implemented my dot-product function using AVX instructions. Everything works, but it's actually slower than my generic one that uses iterators: ~70ms vs ~170ms with AVX instructions.
I believe the problem is that I have to allocate my buffers on a 32-byte boundary to use the AVX instructions, and this requires me to allocate new memory, and copy over the buffers with every call. My questions is, could I make an allocator that always allocates memory on a 32-byte boundary, and set this as the system allocator? What I'm currently doing requires me to call both alloc
and dealloc
(which at that point, why am I using Rust?), so my thought is to swap out the system allocator for one that always allocates on a 32-byte boundary.
If making all allocations on a 32-byte boundary is a bad idea (I'm thinking it might be for small allocations?), then is there an easy way to allocate just these buffers on a 32-byte boundary, but have Rust automatically de-allocate them at the proper time, like a normal Vec
? Is it as simple as making my own Vec
that implements the Drop
trait, but allocates on a 32-byte boundary?