Profiling in rust application


#1

I’m working with a project have some modules written in rust and python. The problem here is with python, i can use cprofile for profiling and receive informations about number call, time call, name function called, etc. but with rust i can’t find some tools similar.
I’ve try with flame and cpuprofiler but none of them work like i expect. I just want to profile some code and receive what function are called in that script.

So have anyone try profiling and get information about function called in rust applications please let me know.


#2

As always in performance analysis, the answer depends on what exactly you want to know (for example, exact “number of calls to a function” is a very expensive quantity to measure, so most profilers prefer to give you an approximate “percent of time spent in the function” instead).

But in my opinion, the best “default choice” these days is a hardware-assisted sampling profiler. These tools are either operating system or hardware-specific, here are some references that you can look up:

  • Linux: perf (aka perf_events)
  • Windows: WPA (or its xperf(view) ancestor)
  • macOS: XCode Instruments
  • Intel: VTune Amplifier
  • AMD: CodeXL
  • NVidia: NSight

Personally, I spend most of my time using Linux these days, so perf is what I am most proficient with. I have written a small tutorial about perf + rust on this forum a while ago @ Profilers and how to interprete results on recursive functions .

One thing which I have learned since I wrote this tutorial is that --call-graph=dwarf is a better default than --call-graph=lbr, because LBR only works if your hardware is recent enough and your application call chains are not too deep (~16 frames or less). I would now recommend starting with DWARF, then trying with LBR, checking that the LBR mode works and that the results are similar, and only in that case using LBR.


#3

Thanks HadrienG1h for your suggestion. But to profile a complex application, they don’t seem efficient. Did you know some framework or tool support this kind of profiler. because to see function name and number call and time calling in that function is very necessary. May be some profilers in progress developing or may be some product for business.


#4

Define “efficient” . If you are talking about overhead, then sampling has much less overhead than counting or tracing (which I guess is what you are referring to) on typical applications.

If you want a counting profiler in a compiled language like Rust or C, you can give callgrind a chance, but be prepared for a ~10x factor in CPU time which will heavily bias CPU vs I/O costs in your performance profile.

Like with perf, you will need to compile your application with debugging symbols.


#5

Thanks again for your reply. i tried callgrind but i cannot understand what information it gave us. For instance, I have a hello world application, i run cargo build --release and valgrind --tool=callgrind target/release… for profiling. Try to run callgrind_annotate with the output file, i cannot understand its stats, seen like almost system calling and do not have a simple function that i called it to println!() out. Could you explain what stats is are those please.


#6

Have you built your program with debugging symbols ? The simplest way to do so is to add “-g” to the RUSTFLAGS environment variable, for a longer-term solution you can modify your Cargo.toml to permanently enable debug symbols on your release builds.

If you are calling into external libraries (C, C++), you will also need to install debugging symbols for these in order to get detailed profiles. But the way to do this is OS-dependent and from the profile you posted it looks like you use macOS, so I cannot help much there.

Another callgrind limitation is that since it operates in user mode, it cannot provide a good analysis of syscalls and multi-process applications. Only profilers with OS kernel integration (like perf, XCode Instruments…) can do this correctly.

It may also be that your application is actually mostly composed of syscalls. You are talking about a hello world program, so if it’s just a println, most of your CPU time is actually spent loading and setting up the application process. Try adding a loop to increase the time spent in the actual application code :slight_smile:


#7

Very cool, thanks!