Profiling code with Rayon

Rayon seems to work recursively and it creates very long and chaotic stack traces. (rayon_core::job + rayon_core::join + catch_unwind) × hundreds of levels deep.

I'm trying to profile my Rayon-heavy code (using Instruments.app on macOS). It's oriented towards drilling down into heaviest functions, based on stack traces. However when rayon is used the function calls are spread all over the stack traces at various depths, wherever Rayon found opportunity to run more work, so all the trace data is mixed up. "Invert Call Tree" option helps a bit, but I prefer the top-down view.

Is there a mode/setting/tool for Rayon that makes it run in a more "boring" way, e.g. unwinding stacks to the top of a thread instead of jumping deeper? (some performance loss from this is OK)

There's a similar request in rayon#591, and I don't have a good answer for this. We could definitely use more people thinking about how to improve this sort of thing!

I don't think we can "unwind" stacks as you suggest for execution, because in general recursive rayon jobs may be referencing that stack. Maybe parallel iterators in particular could use a flatter scope+spawns instead of recursive joins, but we wouldn't want that all the time because it requires heap allocation.

1 Like

Yes, of course. I was thinking about making it only opt-in, even if that's a compile-time setting.