Have you built your program with debugging symbols ? The simplest way to do so is to add “-g” to the RUSTFLAGS environment variable, for a longer-term solution you can modify your Cargo.toml to permanently enable debug symbols on your release builds.
If you are calling into external libraries (C, C++), you will also need to install debugging symbols for these in order to get detailed profiles. But the way to do this is OS-dependent and from the profile you posted it looks like you use macOS, so I cannot help much there.
Another callgrind limitation is that since it operates in user mode, it cannot provide a good analysis of syscalls and multi-process applications. Only profilers with OS kernel integration (like perf, XCode Instruments…) can do this correctly.
It may also be that your application is actually mostly composed of syscalls. You are talking about a hello world program, so if it’s just a println, most of your CPU time is actually spent loading and setting up the application process. Try adding a loop to increase the time spent in the actual application code