Is it possible to print the callgraph of a Cargo workspace?

I would like to display (and ideally navigate, but that's optional) the callstack of my Rust program. I found cargo-call-stack, but it seems that it doesn't work if the project uses workspace (I opened an issue).

It there any way to print the callstack of a rust program?

There are several profiling programs available that also provide callgraphs. Often times these programs are not specific to rust and can work for many languages (like C, C++, Go, Rust, Java, etc...). Usually they rely on debug symbols.
So these programs won't mind that you're using a Cargo workspace since they do all their magic by running the binary and performing runtime analysis of said binary.

What OS are you running? If you're on Linux, there is perf and there is callgrind/KCachegrind (part of the valgrind suite of software). I could expand on these pieces of software more if you're using Linux.

If you're not using Linux, I think you should make a new topic where you ask people what profiling software they use for <your OS>, and that you need callgraph support. A pointed question like that will probably get more answers.

Yes, I'm using Linux :slight_smile:

First things first, both of these require debug symbols. The dev profile has debug symbols enabled, but you might not want to base your analysis off of the unoptimized version, since the optimizer is going to do various transformation to your program (including inlining). This means that the conclusions you get from analyzing an unoptimized program might be very different.
So you'll want to enable some optimizations for the dev profile. In your Cargo.toml:

[profile.dev]
#debug = true # This is the default for dev profiles
opt-level = 2

I also want to note that I'm no expert on these profilers. They have a lot of configuration options available and a lot of functionality and data.

Screenshots:

Perf

Perf with --call-graph=graph: use a graph tree, displaying absolute overhead rates.

Perf with --call-graph=fractal: like graph, but displays relative rates. Each branch of the tree is considered as a new profiled object.

With that out of the way:


What is perf?

Perf is a statistical profiler. The gist is that several times per second perf will interrupt or "sample" your program so it can take a peek at what's going on. This allows the program you profile to run at near full speed.
During these samples, perf will look at the call stack to analyze what your program is doing and where it is. Perf has a lot of other functionality, more than I could cover in this post.
POTENTIAL DEALBREAKER: Perf is entirely terminal UI based. If you don't like this, see KCacheGrind.

Setup to use perf

Before using perf, you'll need to do some system configuration or when you try to use it, you'll get this text:

$ perf record
Error:
You may not have permission to collect system-wide stats.

Consider tweaking /proc/sys/kernel/perf_event_paranoid,
...

Create /etc/sysctl.conf if it doesn't exist on your system, and append kernel.perf_event_paranoid = -1 to it. Then either reboot or run sudo sysctl --system to load this configuration.


Using perf

First, install perf using your distro's package manager.
To use perf you first have to record a profile of it running:

$ cargo build
$ perf record -a --call-graph dwarf target/debug/<name of your binary> <any args to said binary>

--call-graph dwarf enables callgraphs using the debug data
This will write a file called perf.data to your current directory.
To view the report, run $ perf report. You'll be presented with a screen of text with the symbol name on the right and the overhead (aka "cost") of that symbol on the left. Press ? to get keybindings help. The important ones are:

  • up/down to move up and down
  • / to search for a symbol/function name
  • + to expand/collapse the callgraph

The man page, perf-report is pretty good. There's a lot of configuration you can do. I mainly use --call-graph=<...>:

  • perf --call-graph=0.5 to filter out all calls whose cost is below 0.5%. I usually use perf --call-graph=1 to filter out the noise I don't care about
  • perf --call-graph=fractal to calculate overheads of a symbol relative to the symbol's caller
  • The above combined is: perf --call-graph=fractal,1
2 Likes

KCacheGrind/QCacheGrind

These are GUIs. I had actually never used these before today. Learning them had been on my todo list for a while so I took this opportunity to learn about them.
QCacheGrind and KCacheGrind are mostly the same. QCacheGrind is just the purely Qt (QT is a GUI framework) version of KCacheGrind. There are some differences in the GUI's layout, and I like QCacheGrind better so I'll refer to that for the rest of this.

The same symbol as the Perf screenshots in QCacheGrind:

QCacheGrind with shortened types

Zooming into specific functions

What is QCacheGrind/callgrind

QCacheGrind is a GUI that visualizes profile data created by callgrind.
Callgrind is a tool that's part of valgrind that analyzes callgraphs. Valgrind is in essence a virtual machine. It's a lot more complicated than that, see the wikipedia page for the technical details. The important bit is that, best case scenario, your program runs 75% slower than usual while callgrind is doing its analysis. It's usually far slower though. And even if your program is multithreaded, callgrind won't run your program with more than 1 core.

So basically, it's very powerful, but you pay for that power with very long profile generation time.

In callgrind, it took my toy raytracing program 32.8 seconds to run with a screen resolution of "160x80" pixels. Outside of callgrind, the same program took 1.8 seonds when using 1 thread.

Thankfully, you only need to create a profile once.

Using QCacheGrind

Install QCacheGrind and valgrind using your distro's package manager. callgrind should come with the valgrind package. Also install graphviz to get callgraphs.
To record the profile:

$ cargo build
$ valgrind --tool=callgrind target/debug/<name of your binary> <any args to said binary>
$ # Go make a cup of coffee/tea while you wait...
$ qcachegrind

Callgrind will write a "callgrind.out.<PID>" file when it completes. Then QCacheGrind will search the current directory for compatible files and open it in the GUI.
On the left you got your search. Search for a symbol and click it, and on the bottom right window you'll have a bunch of tabs with information about the callees. There's a tab called "Call Graph". You can maximize that window by dragging the border upwards.

There's a lot of customization here. And you don't have to restart the whole program to apply those customization (*cough* perf *cough*). On the top left of the whole window, you got options for toggling

  • Detect Cycles (I still don't understand how the cycles work and how to interpret them...)
  • Relative cost
  • Relative to parent
    This combined with "Relative cost" is basically equivalent to --call-graph=fractal in perf. Difference is that each symbol's cost is relative to the currently selected symbol in QCacheGrind. Whereas in perf, each symbol's cost is relative to that symbol's parent.
  • Shorter type names

And there's a bunch of options for configuring the call graph visualization if you right click.

The call graphs have a ton of info. I found these helpful for deciphering them:



Final part, I promise :wink:

Optimizer/Profiler Noise

You'll notice weird functions in both perf and QCacheGrind. It'll show an arrow pointing from atan2 to 0x7fd67.... Sometimes it's just the optimizer doing it's thing, inlining and reordering things and you can just push past it. In this case, it's some symbol belonging to C's libm.
In perf, you can press a to be taken to the annotated assembly of a symbol if it's available. Here perf instead says:

00005583af28bcc3  /usr/lib/libm-2.32.so

This same info in QCacheGrind is in the top right window under the "Source Code" tab.

The function is located in this ELF object: 'libm-2.32.so'

Thank you for coming to my TED talk.

2 Likes

You can also use speedscope.app. It shows you a flamegraph. You can open a file containing the output of perf script, or if you want to save disk space you can run it through the stackcollapse-perf.pl script of https://github.com/brendangregg/FlameGraph (the original flamegraph implementation)

Hey, great breakdown @ArifRoktim. That's rather useful. I recently kind of broke open the same tools for the first time. I didn't know perf has a full terminal UI, that's great!

If you want to use perf without needing a terminal UI, though, try out Hotspot. It's super easy to install the AppImage:

It gives you interactive flamegraphs and some other information by loading the perf recording files.

1 Like

Flamegraphs are a good option that I forgot about. If you want to be able to do

$ cargo flamegraph

see


Were you using it only for profile generation?
Or were you using it with its --stdio interface? Cuz the stdio interface is a little harder to use.

Just a little. :stuck_out_tongue:

1 Like

So far I had just been feeding it to hotspot so I hadn't gotten farther with it than perf record. Hotspot doesn't do everything, though, so it's good to know about the terminal interface.

1 Like

Thanks a lot @ArifRoktim for the detailed answers, it was very informative. You should consider posting this in a blog post, I think that it's a great introduction to perf and valgrind/cachegrind.

I realize that my question was not totally clear. What I want is the static callgraph of my program. My goal is to be able to understand the relation between the various part of my code. Unfortunately the output of perf and cachegrind, especially if you use the release version isn't really usable for this task (even if this very information is absolutely critical when doing perf analysis). You can see that many functions have strange names (all the one that results from monomorphisation, and the various closure are even worse). Many function will also not be reported since they have been totally removed by the inliner. So I don't think I will be able to extract the graph of dependencies and usage from my codebase using those tools that I was hoping for initially.

When you first talked about perf, I thought that maybe it had an option that I didn't know about to generate statically the call graph a program (that's highly probable given how that tool is powerful). I just re-read my first post, and while the question in the title is right, I wrongly used the term "callstack" in the OP instead of "callgraph". The confusion is absolutely my own mistake.

1 Like