KCacheGrind/QCacheGrind
These are GUIs. I had actually never used these before today. Learning them had been on my todo list for a while so I took this opportunity to learn about them.
QCacheGrind and KCacheGrind are mostly the same. QCacheGrind is just the purely Qt (QT is a GUI framework) version of KCacheGrind. There are some differences in the GUI's layout, and I like QCacheGrind better so I'll refer to that for the rest of this.
The same symbol as the Perf
screenshots in QCacheGrind:
QCacheGrind with shortened types
Zooming into specific functions
What is QCacheGrind/callgrind
QCacheGrind
is a GUI that visualizes profile data created by callgrind
.
Callgrind is a tool that's part of valgrind
that analyzes callgraphs. Valgrind is in essence a virtual machine. It's a lot more complicated than that, see the wikipedia page for the technical details. The important bit is that, best case scenario, your program runs 75% slower than usual while callgrind is doing its analysis. It's usually far slower though. And even if your program is multithreaded, callgrind won't run your program with more than 1 core.
So basically, it's very powerful, but you pay for that power with very long profile generation time.
In callgrind, it took my toy raytracing program 32.8 seconds to run with a screen resolution of "160x80" pixels. Outside of callgrind, the same program took 1.8 seonds when using 1 thread.
Thankfully, you only need to create a profile once.
Using QCacheGrind
Install QCacheGrind
and valgrind
using your distro's package manager. callgrind should come with the valgrind package. Also install graphviz
to get callgraphs.
To record the profile:
$ cargo build
$ valgrind --tool=callgrind target/debug/<name of your binary> <any args to said binary>
$ # Go make a cup of coffee/tea while you wait...
$ qcachegrind
Callgrind will write a "callgrind.out.<PID>" file when it completes. Then QCacheGrind will search the current directory for compatible files and open it in the GUI.
On the left you got your search. Search for a symbol and click it, and on the bottom right window you'll have a bunch of tabs with information about the callees. There's a tab called "Call Graph". You can maximize that window by dragging the border upwards.
There's a lot of customization here. And you don't have to restart the whole program to apply those customization (*cough* perf *cough*). On the top left of the whole window, you got options for toggling
- Detect Cycles (I still don't understand how the cycles work and how to interpret them...)
- Relative cost
- Relative to parent
This combined with "Relative cost" is basically equivalent to --call-graph=fractal
in perf
. Difference is that each symbol's cost is relative to the currently selected symbol in QCacheGrind. Whereas in perf, each symbol's cost is relative to that symbol's parent.
- Shorter type names
And there's a bunch of options for configuring the call graph visualization if you right click.
The call graphs have a ton of info. I found these helpful for deciphering them:
https://stackoverflow.com/questions/30713699/how-to-interpret-kcachegrind-graphs
https://stackoverflow.com/questions/23786152/how-to-interpret-results-from-kcachegrind
Final part, I promise
Optimizer/Profiler Noise
You'll notice weird functions in both perf
and QCacheGrind
. It'll show an arrow pointing from atan2
to 0x7fd67...
. Sometimes it's just the optimizer doing it's thing, inlining and reordering things and you can just push past it. In this case, it's some symbol belonging to C's libm.
In perf, you can press a
to be taken to the annotated assembly of a symbol if it's available. Here perf instead says:
00005583af28bcc3 /usr/lib/libm-2.32.so
This same info in QCacheGrind is in the top right window under the "Source Code" tab.
The function is located in this ELF object: 'libm-2.32.so'
Thank you for coming to my TED talk.