For some reason, the float version just doesn't optimize like the integer version does.
Is this a "-ffast-math" type of problem? Llvm doesn't want to do the vectorization since it may change the floating point result slightly(?)
Edit: It's a -ffast-math type of problem, documented here
test zipdot_f32_checked_counted_loop ... bench: 1,347 ns/iter (+/- 664)
test zipdot_f32_default_zip ... bench: 1,392 ns/iter (+/- 13)
test zipdot_f32_unchecked_counted_loop ... bench: 1,343 ns/iter (+/- 371)
test zipdot_f32_zipslices ... bench: 1,342 ns/iter (+/- 466)
test zipdot_f32_ziptrusted ... bench: 1,342 ns/iter (+/- 387)
test zipdot_i32_checked_counted_loop ... bench: 380 ns/iter (+/- 113)
test zipdot_i32_default_zip ... bench: 1,401 ns/iter (+/- 27)
test zipdot_i32_unchecked_counted_loop ... bench: 308 ns/iter (+/- 154)
test zipdot_i32_zipslices ... bench: 380 ns/iter (+/- 134)
test zipdot_i32_ziptrusted ... bench: 301 ns/iter (+/- 148)