I'm writing a project to do some tests related to prefetch intrisics.
As part of that, I have some benchmarks I'm struggling to make sense of. I'd like to inspect the binaries, to see if there's some auto-vectoring or some other differences in the produced code explaining the oddities I'm noticing.
Often when you run cargo test or cargo bench it'll print the path to the executable containing the tests/benchmarks.
$ cargo test
Compiling scad-compiler v0.1.0 (~/Documents/scad-rs/crates/compiler)
Finished test [unoptimized + debuginfo] target(s) in 2.46s
Running unittests src/lib.rs (~/Documents/scad-rs/target/debug/deps/scad_runtime-b1e7fa9756c915ec)
running 1 test
test value::tests::builtin_function_partialeq_only_works_on_identity ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
In this case, the executable would be ~/Documents/scad-rs/target/debug/deps/scad_runtime-b1e7fa9756c915ec (the bit after "Running unittests").
Now let's say I wanted to see the machine code that was generated for my value::tests::builtin_function_partialeq_only_works_on_identity function.
From there, I can pass it to objdump and ask it to --disassemble my code and --demangle symbol names. The demangling is important because then I can search for Rust names.
$ objdump --disassemble --demangle integration_tests-67beaec54ffb568c | less
Using less's builtin search function to find tests::builtin_function_, I can skip past a couple call instructions before finding the section containing the function's machine code.
I think the key difference is that I'm checking an executable I know will contain my function, whereas you are checking the rlib, which may not necessarily contain machine code for your function (e.g. the definition for generic functions are embedded as metadata so they can be compiled to machine code by later crates).
It wasn't much of a concern in my example because Rust's test harness stores a list of function pointers, which means the compiler needs to generate at least one copy of builtin_function_partialeq_only_works_on_identity so we can get a pointer to it.
Another thing that can sometimes help is adding #[no_mangle] so the symbol name is more predictable or dummy functions which just call your function and are annotated with #[inline(never)]. That way you won't forget to remove #[inline(never)] from your performance-sensitive function and accidentally commit it to master.
Once I've dumped my assembly, I can roughly see what instructions are executed, where are the jumps (thanks to --visualize-jumps in particular), etc, what's the next step? Are there more tools that help me parse the generated assembly?
In particular, I'm wondering how the dumped assembly maps to the original code. I know that the Godbolt compiler explorer has an interface that shows those mappings with color-coding. Is there some equivalent in the console?
If you want to do a more detailed analysis, it might be time to crack out a fully functional disassembler/reverse engineering tool.
One command-line tool that I used a while back is radare, which is kinda like the vim of reverse engineering. It's super powerful and you can be really productive once you are fluent with it, but there is a non-trivial learning curve associated with it.
Otherwise, you can always use gdb to trace execution and see jumps.
I don't know of a Godbolt-like CLI tool off the top of my head, but in theory it's possible because Godbolt just uses debug information to associate instructions with places in the source code. I wouldn't be surprised if Radare had a similar function, otherwise some debuggers will show disassembled machine code with the corresponding source code as a comment next to it.
You can probably use the --section flag if you know which section your function will be in, but that might be tricky to find out in practice. Normally I just pipe into less and use search to find the things I care about.