Rust vs C++ Theoretical Performance

This is a bit of a strange question, e.g. how close to optimised is C or C++ or Fortran or ...? They're certainly not fully optimised as compilers keep improving the speed of their generated code, and, more broadly, hardware keeps getting new functionality that the compilers (and languages/their libraries) can use.

The reference compiler for Rust, rustc (it's also the only one, at the moment), uses LLVM, which is an industrial strength C/C++ optimiser (used by the clang C and C++ compiler, among others), so one way to answer the question is that Rust is as optimised as those languages.

I guess a more interesting way to look at it is how much room for improvement Rust has, something like how much information/flexibility the compiler has that isn't currently being used fully or even at all. There's a fairly non-trivial amount of this (most either because we haven't invested the effort in informing LLVM/it cannot understand the constraint, or because it needs non-trivial changes in the compiler and we've been focusing on semantics with the first releases), e.g., off the top of my head:

  • information about & being non-null isn't fully utilised
    • it gets lost easily especially via the null-pointer optimisation of Option<&T> when returned from an inlined function (this appears most often with the slice iterator, where next returns Option<&T>, and inlining is relied on for speed, e.g. How to “zip” two slices efficiently)
    • it is hard to inform of this fact in all circumstances
  • (non-)aliasability of references is similar
  • the representation of enums and structs are open to being tweaked by the compiler to optimise (e.g. reordering struct fields to make them smaller, reducing sizes of allocations and reducing cache pressure)
  • we do "filling drop", where moves and drops of a value will overwrite the values memory location with a specific byte pattern to handle conditional moves (this also involves adding an extra field to structs with Drop implementations, increasing their size), implement "dynamic drop" semantics using flags on the stack rather than zeroing · Issue #5016 · rust-lang/rust · GitHub
  • the optimisation passes LLVM uses are essentially the default used for C/C++, but there might be a better set/ordering for Rust code (e.g. different idioms), e.g. Investigate activating the IRCE LLVM pass · Issue #22987 · rust-lang/rust · GitHub

Those are rustc issues, not touching rustc's standard library at all.

(NB. the first three at least are things that Rust can do/knows but C cannot, theoretically allowing Rust to be faster than C for programs/problems that happen to rely on that information.)

Comparing language speeds so shortly seems to be missing many facets of performance, e.g.:

  • for C++ vs. C especially, C++ is (pretty close) to source-compatible with C so they should be able to be basically identical.
  • C and C++ benchmarks often rely on various non-standard compiler extensions for performance.
  • different compilers will have different performance, e.g. C++ code compiled with GCC (or Intel's compilers) can easily be faster than C code compiled with Clang.
  • different CPUs will have different performance characteristics which can interact with many things, e.g. the compiler may not understand certain CPUs as deeply as others, a language may make assumptions that are true/efficient on certain CPUs but not others (e.g. older ones vs. newer ones, or x86 vs. ARM), a library may have processor-specific optimisations, e.g. assembly, for certain CPUs.
  • different languages focus on certain tasks, e.g. Fortran is particularly good at numeric/array programming, often faster than C, but has had less effort put into other areas.
  • having a high-performance/zero-overhead FFI means one can reach the speeds of existing libraries in other languages by just using those libraries (e.g. Rust code can use GMP for arbitrary precision arithmetic just as fast as C code can)
  • different powers of abstraction will strongly influence the relationship between microbenchmarks and real-world code, e.g. C++ code (and Rust to a lesser extent) can do powerful metaprogramming tricks to compile easy-to-write code to high-performance machine code, while the human has to do all the tricks manually in C which is fine for a small benchmark but gets annoying/unmaintainable for larger codebases. Eigen is a neat example of this, as are libs like Thrust that make writing GPGPU code easy. (This plays into the previous point too: one can optimise the use of existing libraries based on the sequence of operations & knowledge about the behaviour, similar to how compilers can optimise x + x to x << 1 if x is, say, u32.)
  • similarly, different powers of abstraction will influence the reusability of code. Real-world performance is likely to be much improved if optimised data structures can written once, rather than people having to invent their own (slower) versions. Generics are a big point here, e.g. even something as simple as std::vector<T> (C++) or Vec<T> (Rust) would be annoying to write in C as soon as it is needed for more than one type T (very likely in any code base of a moderate size). A typical way to do this in C is to write a single version that stores void* pointers, but that forces more allocations and adds indirection (increasing cache pressure) (oh, and loses type-safety, increasing the scope for bugs), another typical way is to use a linked list rather than a vector, which has the same problems and more. This is even more visible with more complicated data structures like associative maps like trees and hashmaps (Rust uses a fairly fancy algorithm internally), and then there's the whole next level of optimisation/complexity with concurrent data structures (e.g. using atomics correctly with maximum performance is hard).

(The last two mean that even if the core language performance, whatever that means, of C++/Rust is slower than C, real-world code in the former runs faster than real-world C because it is much easier to get fast code.)

That said, one can still compare existing benchmarks as some sort of vague upper bound on the possible future performance of the language/compilers/standard library.

Those are mostly intended as a general warning against summarising language performance so briefly, but a few of those points are points in Rust's favour (e.g. abstractions, the performance of Fortran the language is driven largely by controlling aliasing which Rust does too, zero-cost FFI), and a few are currently against (e.g. no Rust compiler with a GCC or Intel backend, fewer specialised optimisations in the standard library).

Rust aims to have the same philosophy as C++: don't pay for what you don't use. I.e. if we add a new feature X that has a certain cost, and you don't use that X then you shouldn't see that cost at all.

12 Likes