I've narrowed down optimization opportunities to a few obvious costly calls I'm making, but for the life of me I can't figure out what this function call is implying:
Is the flamegraph showing you what is calling drop_in_place?
Looking at the example flamegraph on Samply's readme, you can ask a similar question:
"Why is std::thread::park_timeout taking up so much time?"
Looking at the bars underneath (in the flamegraph) we can see it's called in a function associated with downloading data. It makes sense that rustup would spend time downloading data, as that's what it supposed to be doing.
Is the application you're running dealing with a lot of owned Strings?
If you start by reducing the to_string calls where you can, then you may see the drop calls also become reduced.
The main cost of a String or Vec destructor is freeing its heap-allocated memory. On macOS, there is an added cost of zeroing out the freed memory (source).