Understanding target-cpu

ManuelCostanzo · September 30, 2020, 8:43pm

Hello to all,

I have a big doubt about the target-cpus. On my server, as native target-cpu is selected "Cannonlake". I made a sweep among all the possible target-cpus, and to my surprise the target-cpu "knl" was the one that gave me the best performance. This doesn't make much sense, so I want to ask you what could have happened?

The program is the n-body algorithm, I do mathematical calculations that are not very rare.

Obviously I always compile with --release and opt-3.

Any help to understand this ? If I change the server I will have to run with all the target-cpu and keep the best one, does it make sense to do so ?

Thank you.

kornel · September 30, 2020, 9:30pm

Unless you've benchmarked it very very carefully in a very controlled way (not just running the programs one after another with different settings), it's likely to be a random result.

Performance of modern CPUs is very noisy and uneven. They have very dynamic balancing act of thermal and energy distribution across cores, varying frequency with temporary turbo boosts, and multiple levels of caches and branch predictors sensitive to bit patterns in memory addresses.

ManuelCostanzo · September 30, 2020, 10:04pm

Yes, I checked and I have the same results. My program is very regular so allways I have the same times too.

Is it a bad practice to use a target-cpu other than the native one ? Am I missing something? I understand that target-cpu=knl is for Knights Landing, and I am not in this platform (but the instructions are similar)

Cocalus · October 1, 2020, 3:42am

You would have to look at the resulting assembly to verify. But it's possible the auto vectorizers is weighted to use avx-512 instructions with a knl target, since that was the main goal of that infrastructure. The cannon lake being newer may have a super set of the avx-512 instructions (there's a ton of variations), but if it's missing any instructions it could crash with an illegal instruction, if the compiler chooses to use any missing instructions. The reason the cannonlake may avoid avx-512 is because if the CPU uses enough avx-512 instructions it needs to reduce clock speed to be stable (they do twice the work and produce more heat so a %30 clock reduction to "double" the FLOPS, avx-2 can do this to a lesser extant as well). So for the average program there's a threshold where enough avx-512 could slow the clock but not gain enough to make up for it (the average program doesn't blaze all cores 100%). but when most of cores are doing heavy avx-512 (like your simulation) that can more than make up for the frequency reduction.

ManuelCostanzo · October 1, 2020, 9:21am

Excelent !!! Thank you for clarifying this for me !

system · December 30, 2020, 9:21am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Don't use `-C target-cpu=native` on Compiler Explorer	5	704	April 12, 2024
Benchmarking the bytecount crate help	2	430	January 16, 2020
My target cpu is Intel core i7 Which one should I choose from the list below?	8	870	March 17, 2022
Intel SIMD and processor frequency	2	514	January 12, 2023
Crosscompiling and cpu-native help	3	1590	April 7, 2020

Understanding target-cpu

Related Topics