Just for fun.
Will Rust or LLVM optimize the code with avx
as much as possible, automatically?
I think you could get pretty decent performance by rewriting BLAS in Rust and letting LLVM do its thing, but keep in mind optimisers are also bound by the as-if rule while humans writing unsafe
code or inline assembly aren't. So it'll try its hardest to use avx
and other vectorising instructions automatically, but only if you wouldn't be able to tell the difference.
Meanwhile a human can make assumptions the compiler can't (e.g. you document that a function should only be given arrays who's length is a multiple of 256 bits and therefore theoretically has more room to optimise.
Of course, if the motivation is about having fun/learning rather than creating a competitor to BLAS then the answer is a resounding "YES!".
Using the IBM Fortran compiler beats the straight C version of BLAS. IBM's Fortran compiler does the best avx opto I have seen so far. I have been calling it from Rust (via LAPACK) and getting great performance.
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.