That got me wondering. So I tried release and debug builds of a convolution over 20 million elements in various Rust implementations, serial and parallel execution. Here are the results:
Implementation Debug Time(ms) Release Time(ms) Speed up
Implementation Debug Time(ms) Release Time(ms) Release build Speed up
zso::convolution: 380640 24195 16
zicog::convolution_slow: 410088 8379 49
dodomorandi::convolution: 475962 8086 59
zicog::convolution_safe: 368243 7980 46
Abjorn3::convolution: 900673 6619 136
pthm::convolution: 462066 1234 374
zicog::convolution: 428833 1228 349
alice::convolution_serial: 450238 1238 363
zicog::convolution_fast: 399970 1222 327
alice::convolution_parallel: 120601 375 321
Really, over 300 times slower in debug mode.
This does highlight that the compiler can have a really hard time optimizing your code if you don't write it "just so".
Codes are here if you want to play: GitHub - ZiCog/rust_convolution