I am developing a multi-threaded rust application where many threads continuously run a computationally heavy optimization algorithm. I am looking for performance bottlenecks in each of the algorithms, but I am having a hard time producing correct profiling information and flamegraphs.
Every flamegraph that I create always seems to be missing stack information from some of the running threads?
Does anyone know a good way of profiling threads individually in rust? Currently I am thinking about just learning perf and solely relying on that but cargo flamegraph seems a little more convenient.
Instrumenting your code "intrusively" allows fine grained control over what gets captured in the flamegraph. I've used profiling for this in the past, but there are also pprof, firestorm, and tracing-flame.