Is it possible to see how code translates to assembler?

I've heard that a reasonable metric of efficiency is how many lines of assembler code converts into. Like in Python, an addition translates to ~10 operations in its own bytecode, and to about a hundred assembler lines, whereas C or Rust generate exactly one, or a handful of them (I've not seen the actual code).

I wonder what tools are available to test this and see, and if there's a way to convert Rust into assembler and see what line converts to what instructions.

For relatively short snippets, you can always check them on Compiler Explorer - don't forget to add -O (as in the rightmost tab), to see the equivalent of cargo build --release.

For longer examples or multi-module builds, there's cargo-show-asm.

7 Likes

When profiling code using a tool like samply (macOS, Linux), perf (Linux) or Instruments (macOS) there is usually an option to see the assembly for a function annotated with the number of samples where execution was at a particular instruction. In Samply for example double-clicking a function brings up the asm view.

In addition to looking at the overall volume of code for a function, I tend to look at things like:

  • Has the compiler auto-vectorized a loop using SIMD instructions?
  • Did loops get unrolled?
  • Are the hotspots in the math operations, memory operations or loop overhead?
  • Did branches get eliminated?
2 Likes

Just be aware as you're doing this that with an optimizing compiler (for any language), there's no consistent conversion of a line of source code to instructions. An optimizing compiler interprets your code as a program for an "abstract machine", and is only responsible for outputting a program that causes your real machine to do the same thing as the abstract machine would do.

As a result, some source code will output no assembler at all, while other source code will output different assembler based on other information the optimizing compiler has to hand; for example, if you're indexing a slice, a naïve compiler has to output bounds checks and panic on out-of-bounds for every indexing operation, while the optimizing compiler may know from a past operation that the slice must be at least N items long, and skip the bounds checks if the index is less than N.

2 Likes