I’m working on some digital signal processing code for a SDR and to speed things up I implemented my dot-product function using AVX instructions. Everything works, but it’s actually slower than my generic one that uses iterators: ~70ms vs ~170ms with AVX instructions.
I believe the problem is that I have to allocate my buffers on a 32-byte boundary to use the AVX instructions, and this requires me to allocate new memory, and copy over the buffers with every call. My questions is, could I make an allocator that always allocates memory on a 32-byte boundary, and set this as the system allocator? What I’m currently doing requires me to call both
dealloc (which at that point, why am I using Rust?), so my thought is to swap out the system allocator for one that always allocates on a 32-byte boundary.
If making all allocations on a 32-byte boundary is a bad idea (I’m thinking it might be for small allocations?), then is there an easy way to allocate just these buffers on a 32-byte boundary, but have Rust automatically de-allocate them at the proper time, like a normal
Vec? Is it as simple as making my own
Vec that implements the
Drop trait, but allocates on a 32-byte boundary?