Performance: strided sum of iterator (and rayon)

The word you're looking for here is "vectorization".

See point 4 in the following post for why this usually doesn't happen automatically with f64: