I've heard that a reasonable metric of efficiency is how many lines of assembler code converts into. Like in Python, an addition translates to ~10 operations in its own bytecode, and to about a hundred assembler lines, whereas C or Rust generate exactly one, or a handful of them (I've not seen the actual code).
I wonder what tools are available to test this and see, and if there's a way to convert Rust into assembler and see what line converts to what instructions.
For relatively short snippets, you can always check them on Compiler Explorer - don't forget to add -O (as in the rightmost tab), to see the equivalent of cargo build --release.
For longer examples or multi-module builds, there's cargo-show-asm.
When profiling code using a tool like samply (macOS, Linux), perf (Linux) or Instruments (macOS) there is usually an option to see the assembly for a function annotated with the number of samples where execution was at a particular instruction. In Samply for example double-clicking a function brings up the asm view.
In addition to looking at the overall volume of code for a function, I tend to look at things like:
Has the compiler auto-vectorized a loop using SIMD instructions?
Did loops get unrolled?
Are the hotspots in the math operations, memory operations or loop overhead?
Just be aware as you're doing this that with an optimizing compiler (for any language), there's no consistent conversion of a line of source code to instructions. An optimizing compiler interprets your code as a program for an "abstract machine", and is only responsible for outputting a program that causes your real machine to do the same thing as the abstract machine would do.
As a result, some source code will output no assembler at all, while other source code will output different assembler based on other information the optimizing compiler has to hand; for example, if you're indexing a slice, a naïve compiler has to output bounds checks and panic on out-of-bounds for every indexing operation, while the optimizing compiler may know from a past operation that the slice must be at least N items long, and skip the bounds checks if the index is less than N.