Opt-level >= 2 removes debug symbols needed in perf profiling

LBR call graphs are awesome when applicable (10kHz sampling with negligible overhead, hurray!), but they come with a number of prerequisites:

  • You need a recent enough CPU (Haswell+ for Intel, I don't know for other CPU manufacturers)
  • You need a recent enough kernel (LBR call graph itself was introduced in 4.1, and then you need CPU model specific support so your kernel must be recent enough w.r.t. your CPU)
  • Your stack traces need to be shallow enough, or they will be truncated, which makes children time meaningless (IIRC the limit is 8 frames on Haswell and 16 frames on Skylake)

Due to these many caveats, I have stopped recommending LBR by default, and instead suggest trying out DWARF first and LBR second as an optimization (that is checked against DWARF for correctness).

1 Like