Clippy driving me to insanity, insisting on iterators

The only obvious difference I see between the c2rust version and all of the hand coded ones is the use of c_int instead of usize. Now, obviously you should compare the generated assembly to see what's actually going on... but it might be worth changing the c_ints to usizes to see what effect that has. (Or, for that matter, modifying the C code from int to size_t.)

4 Likes

Taking your hint to heart I made some changes:

  1. The array is now 8192 by 8192 so as to be more inline with the Intel example of 8000 by 8000.

  2. That allowed more easy tweaking of the block size for best performance. Which seems to be 64 on my PC.

  3. Changed f32 to i32 as per the Intel example.

  4. Use inttypes.h in C such that the array element size is defined and also the size_t of the indices.

  5. Changed the execution time reporting to milli seconds.

After all that not much has changed:

Rust on x86_64:

$ RUSTFLAGS="-C opt-level=3 -C debuginfo=0 -C target-cpu=native" cargo run  --release
MAX:        8192,
BLOCK_SIZE: 64,
do_it_0:    808ms
do_it_1:    803ms
do_it_2:    201ms
do_it_3:    282ms
do_it_4:    202ms
do_it_5:    188ms
do_it_6:    188ms
MAX:        8192
BLOCK_SIZE: 64
do_it_0:    836ms
do_it_1:    164ms

C on x86_64:

$ gcc -Wall -O3 -o loop_blocking loop_blocking.c
$ ./loop_blocking
MAX:        8192
BLOCK_SIZE: 64
do_it_0:    812ms
do_it_1:    170ms

What we see on x86_64 is that Rust using unsafe indexing can pretty much match the C. With no unsafe it is a bit behind. Amazingly using all the iterator style suggested by Clippy, do_it_3, is not so far behind.

Rust on ARM aarch64:

$ RUSTFLAGS="-C opt-level=3 -C debuginfo=0 -C target-cpu=native" cargo run  --release
...
MAX:        8192,
BLOCK_SIZE: 64,
do_it_0:    15060ms
do_it_1:    15081ms
do_it_2:    11525ms
do_it_3:    12351ms
do_it_4:    11512ms
do_it_5:    11431ms
do_it_6:    11424ms
MAX:        8192
BLOCK_SIZE: 64
do_it_0:    9453ms
do_it_1:    3021ms

C on ARM aarch64:

$ gcc -Wall -O3 -o loop_blocking loop_blocking.c
$ ./loop_blocking
MAX:        8192
BLOCK_SIZE: 64
do_it_0:    13527ms
do_it_1:    3020ms

What we see on ARM here is that Rust can match C if the code is generated by c2rust. Those last do_it_0 and do_it_1 functions.

Meanwhile Rust that any human would write is horribly slower, do_it_0 to do_it_6.

Sorry if the naming of the various functions is confusing, it evolved that way, I'm not sure how to fix it just now.

All code updated here: https://github.com/ZiCog/loop_blocking if anyone has any suggestions.

3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.