I've got multiple instances of it running on various devices, but the most important instance doing most of the work is nowadays running on this Intel N100 mini PC powered by Linux Mint 22.1.
It's running stable and does what it's meant to do, but it uses more CPU cycles than I wish it would (40-50% of one of my 4 CPU cores, according to htop) - So I'd like to trace / profile what it's spending those CPU cycles on.
I don't want to restart my main instance more often than absolutely necessary, or even worse have it not run for more than a few seconds at most - because that would produce ugly holes in my graphs.
But since this main instance is the only one that's seeing enough load to even notice any relevant CPU usage, I don't think I can use anything but that to gather the profiling information I'm looking for.
I do have tokio-console setup and running, but either I'm not knowledgeable enough to control / interpret its output or it can't tell me which portions of my code cause the high CPU load htop tells me I'm using. I think the reason I don't see anything using too much CPU in tokio-console might be that most of my CPU-load is actually coming from my #[tokio::main] thread/task.
Does anybody have any pointers for me how to do that profiling?
Compile with debug info enabled and when running take recording that you can later view. (Software installed from package probably called linux-perf, if mint is like others.)
# apt install linux-perf
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Package linux-perf is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
E: Package 'linux-perf' has no installation candidate
But it seems the binary you say I should use is already installed by some other package:
# perf -v
perf version 6.8.12
Can you explain the command line options you told me to use? How do I tell perf which program I want to profile?
UPDATE: ahh, I guess your example can be used to profile sleep 10 so I'd have to replace this with my program.
Does this work with software that does not (and is not meant to) ever finish on it's own?
perf report tells me:
Cannot load tips.txt file, please install perf!
But my system knows of references to a package perf just like linux-perf but does not know how to install it.
UPDATE2: Oh so I guess the sleep 10 only makes it record for 10 seconds but record events from all software running on this system (that has this kind of profiling enabled?) At least perf report shows stuff about many different binaries (including 1 line about my project), though I don't know how to interpret it...
Perf can be quite complex, because it is extremely flexible and made for experts. I can recommend to read man pages for it if you want to read up on what the flags do, eg. man perf-record and man perf-report etc.
In particular the flame graph view will be a good starting point. For Rust code I recommend using the latest release for better unwinding. This may be newer than what exists in your distro package manager, but there are appimages in the releases on github.
Do note a gotcha with any profiler if you have a CPU with both Intel P and E cores (or some other mixed core arrangement like some ARM CPUs have): they generate separate performance counters, and it becomes very difficult to interpret the data. I recommend pinning your program to one core type during profiling so the data is easier to understand.
If you use --call-graph dwarf, frame pointers won't be used. To use frame pointers for faster unwinding you need --call-graph fp. That will cause inlined functions to not show up as separate call frames though.
I tried the suggested GUI perf visualizer "hotspot" since I thought that it maybe could show me easier to interpret visualizations, but I'm still completely lost...
Please help. In my tries to cache some stuff that I suspected costs a lot of CPU (finding regex matches) I seem to have also created a "memory leak" - I only used HashMap & HashSet so repeated inserts would not cause ever growing memory, but somehow I still managed to make it do exactly that. No idea why and where.
Compiling with stable and disabling the tokio-console feature got my cpu usage down from ~130% (quadcore cpu) to ~40% and memory usage which was ~1% on startup and ~12% after running for ~4days down to 0.1% memory usage (of 16GB).
Though I still would love to find out which portions of my code are responsible for the majority of those 40% CPU usage.
All those question marks indicate that you don't have debug info for whatever that is. Make sure that you enabled debug info for your program, I don't think it is on by default in release builds.
It could also be some system library perhaps (glibc or something else), but it usually will say which library and then have question marks for the function in that library is being called.
Another option is that it is in the kernel and you are missing kallsyms for your kernel.
Newer versions of hotspot have selectable colour filters at the top which can help indicate if it is kernel system library or your code, but the drop down is missing in your screenshot.
It could perhaps also be that you recompiled between running and analysing (or are running and analysing on diffrent systems, which is possible but complicated to set up).
Ah, yes, that will be it, I usually compile on my development laptop, since most of my target devices aren't powerful enough to comfortably compile on device. But for my main instance that gets enough load that checking where that cpu load clusters I can compile on device.
Compiling on the device isn't the issue (other than paths in debug info). The real issue is running perf and hotspot on the same system.
There is a settings page where you can specify sysroot, kallsyms path etc. This is what you would need to engage with if perf record and hotspot are run on diffrent machines.
A bit hard to tell what exactly happened remotely then. There is a log window on the first tab in hotspot that lists issues such as missing debug info. Worth looking at.