Taking your hint to heart I made some changes:
-
The array is now 8192 by 8192 so as to be more inline with the Intel example of 8000 by 8000.
-
That allowed more easy tweaking of the block size for best performance. Which seems to be 64 on my PC.
-
Changed f32 to i32 as per the Intel example.
-
Use inttypes.h in C such that the array element size is defined and also the size_t of the indices.
-
Changed the execution time reporting to milli seconds.
After all that not much has changed:
Rust on x86_64:
$ RUSTFLAGS="-C opt-level=3 -C debuginfo=0 -C target-cpu=native" cargo run --release
MAX: 8192,
BLOCK_SIZE: 64,
do_it_0: 808ms
do_it_1: 803ms
do_it_2: 201ms
do_it_3: 282ms
do_it_4: 202ms
do_it_5: 188ms
do_it_6: 188ms
MAX: 8192
BLOCK_SIZE: 64
do_it_0: 836ms
do_it_1: 164ms
C on x86_64:
$ gcc -Wall -O3 -o loop_blocking loop_blocking.c
$ ./loop_blocking
MAX: 8192
BLOCK_SIZE: 64
do_it_0: 812ms
do_it_1: 170ms
What we see on x86_64 is that Rust using unsafe indexing can pretty much match the C. With no unsafe it is a bit behind. Amazingly using all the iterator style suggested by Clippy, do_it_3, is not so far behind.
Rust on ARM aarch64:
$ RUSTFLAGS="-C opt-level=3 -C debuginfo=0 -C target-cpu=native" cargo run --release
...
MAX: 8192,
BLOCK_SIZE: 64,
do_it_0: 15060ms
do_it_1: 15081ms
do_it_2: 11525ms
do_it_3: 12351ms
do_it_4: 11512ms
do_it_5: 11431ms
do_it_6: 11424ms
MAX: 8192
BLOCK_SIZE: 64
do_it_0: 9453ms
do_it_1: 3021ms
C on ARM aarch64:
$ gcc -Wall -O3 -o loop_blocking loop_blocking.c
$ ./loop_blocking
MAX: 8192
BLOCK_SIZE: 64
do_it_0: 13527ms
do_it_1: 3020ms
What we see on ARM here is that Rust can match C if the code is generated by c2rust. Those last do_it_0 and do_it_1 functions.
Meanwhile Rust that any human would write is horribly slower, do_it_0 to do_it_6.
Sorry if the naming of the various functions is confusing, it evolved that way, I'm not sure how to fix it just now.
All code updated here: GitHub - ZiCog/loop_blocking: Experiments in achieving C performance in Rust when manipulating arrays. if anyone has any suggestions.