Rust vs C++ Theoretical Performance

I've seen a few benchmarks and seems like Rust is somewhere around neck and neck with C++ as far as runtime speed is concerned, maybe even slightly faster. Which is incredible considering it is also memory safe!

What I'm wondering is, how close to being optimized do you think Rust is? It's only been 6 months since version 1 was released. Considering how Rust works with ownership and what not, seems like by forcing users to program in certain ways it could theoretically be optimized to be even faster than C (which is faster than C++). Or will bloat from future features slow it down a little?

I realize we won't know for sure till it happens. Thoughts?

1 Like

What I'm wondering is, how close to being optimized do you think Rust is? […] Or will bloat from future features slow it down a little?

tl;dr I think carefully written Rust code can be really fast.

Brain dump below.

It is my understanding that from a language perspective, great care has been taken to ensure that there is no 'bloat' that slows things down. For example: For more than a year now, people are talking about adding something like inheritance/extendable enums to the language – and about half of the discussion so far was about how the presented way would perform and which optimisations were possible.

it could theoretically be optimized to be even faster than C

One example: Rust's binary representation is in large parts not specified. Unless you explicitly add #[repr(C)] to a struct, Rust is free to layout the fields in your struct however it wants (to ensure the best alignment and smallest size in memory). There is a nice issue in the RFC repo for more enum optimisations.

Now that I think about it, I'm quite certain that the answer to your original question depends very much on your definition of "theoretical performance". Would you say that the possibilities I described above increase Rust's theoretical performance? Because I'm not so sure – What the Rust compiler does is 'trivial' – from a theoretical/computer science standpoint – to do manually in C/C++ code.

If you are using "theoretical" in a more "practical" sense (:wink:), I think the story is more in Rust's favour. Given its stronger semantics in regard to who owns what memory, how long it needs to live and where it is being written to, the compiler has more data to work with and thus more possibilities to apply optimisations. (Additionally, in practice this means that it's easier for library authors to write code that has no allocations but can still be trusted.)

5 Likes

One thing that Rust has that makes it possible for better code generation is that things a truly immutable / mutable which the compiler can reason about. (const in C/C++ is merely for documentation only and the compiler rarely does any optimisations based on it (as you can just cast it away if you want to)

As LLVM has support for doing opts in just such cases this may likely help. That being said you have to take care when writing Rust code to have same/similar perf as C/C++. For example if you loop over an array using index you will pay the cost for for bounds checks while if you use iterators instead you don't (as iterators are guaranteed to bee within the range of the data)

1 Like

I'm wondering if there are any optimizations based on the Rust aliasing rules. Are rust references restrict in C sense?

Or you can get_unchecked(). Sometimes you need to, to prevent double checking.

1 Like

Ah. Didn't know about that one. Thanks for the tip :slight_smile:

Yes, and rustc informs LLVM of this fact in some places (e.g. function arguments). fn foo(_x: &u8) {} compiles to the following with optimisations:

define void @_ZN3foo20h97a57185fa41be28eaaE(i8* noalias nocapture readonly dereferenceable(1)) unnamed_addr #0 {
entry-block:
  ret void
}

The various words after the i8* represent the attributes rustc associates with & pointers. noalias in particular is restrict.

(This is the main reason that mutating data behind a & pointer, e.g. via &Cell<T>, needs to use UnsafeCell: references that point to data with an UnsafeCell aren't and cannot be considered noalias.)

This is a bit of a strange question, e.g. how close to optimised is C or C++ or Fortran or ...? They're certainly not fully optimised as compilers keep improving the speed of their generated code, and, more broadly, hardware keeps getting new functionality that the compilers (and languages/their libraries) can use.

The reference compiler for Rust, rustc (it's also the only one, at the moment), uses LLVM, which is an industrial strength C/C++ optimiser (used by the clang C and C++ compiler, among others), so one way to answer the question is that Rust is as optimised as those languages.

I guess a more interesting way to look at it is how much room for improvement Rust has, something like how much information/flexibility the compiler has that isn't currently being used fully or even at all. There's a fairly non-trivial amount of this (most either because we haven't invested the effort in informing LLVM/it cannot understand the constraint, or because it needs non-trivial changes in the compiler and we've been focusing on semantics with the first releases), e.g., off the top of my head:

  • information about & being non-null isn't fully utilised
    • it gets lost easily especially via the null-pointer optimisation of Option<&T> when returned from an inlined function (this appears most often with the slice iterator, where next returns Option<&T>, and inlining is relied on for speed, e.g. How to “zip” two slices efficiently)
    • it is hard to inform of this fact in all circumstances
  • (non-)aliasability of references is similar
  • the representation of enums and structs are open to being tweaked by the compiler to optimise (e.g. reordering struct fields to make them smaller, reducing sizes of allocations and reducing cache pressure)
  • we do "filling drop", where moves and drops of a value will overwrite the values memory location with a specific byte pattern to handle conditional moves (this also involves adding an extra field to structs with Drop implementations, increasing their size), implement "dynamic drop" semantics using flags on the stack rather than zeroing · Issue #5016 · rust-lang/rust · GitHub
  • the optimisation passes LLVM uses are essentially the default used for C/C++, but there might be a better set/ordering for Rust code (e.g. different idioms), e.g. Investigate activating the IRCE LLVM pass · Issue #22987 · rust-lang/rust · GitHub

Those are rustc issues, not touching rustc's standard library at all.

(NB. the first three at least are things that Rust can do/knows but C cannot, theoretically allowing Rust to be faster than C for programs/problems that happen to rely on that information.)

Comparing language speeds so shortly seems to be missing many facets of performance, e.g.:

  • for C++ vs. C especially, C++ is (pretty close) to source-compatible with C so they should be able to be basically identical.
  • C and C++ benchmarks often rely on various non-standard compiler extensions for performance.
  • different compilers will have different performance, e.g. C++ code compiled with GCC (or Intel's compilers) can easily be faster than C code compiled with Clang.
  • different CPUs will have different performance characteristics which can interact with many things, e.g. the compiler may not understand certain CPUs as deeply as others, a language may make assumptions that are true/efficient on certain CPUs but not others (e.g. older ones vs. newer ones, or x86 vs. ARM), a library may have processor-specific optimisations, e.g. assembly, for certain CPUs.
  • different languages focus on certain tasks, e.g. Fortran is particularly good at numeric/array programming, often faster than C, but has had less effort put into other areas.
  • having a high-performance/zero-overhead FFI means one can reach the speeds of existing libraries in other languages by just using those libraries (e.g. Rust code can use GMP for arbitrary precision arithmetic just as fast as C code can)
  • different powers of abstraction will strongly influence the relationship between microbenchmarks and real-world code, e.g. C++ code (and Rust to a lesser extent) can do powerful metaprogramming tricks to compile easy-to-write code to high-performance machine code, while the human has to do all the tricks manually in C which is fine for a small benchmark but gets annoying/unmaintainable for larger codebases. Eigen is a neat example of this, as are libs like Thrust that make writing GPGPU code easy. (This plays into the previous point too: one can optimise the use of existing libraries based on the sequence of operations & knowledge about the behaviour, similar to how compilers can optimise x + x to x << 1 if x is, say, u32.)
  • similarly, different powers of abstraction will influence the reusability of code. Real-world performance is likely to be much improved if optimised data structures can written once, rather than people having to invent their own (slower) versions. Generics are a big point here, e.g. even something as simple as std::vector<T> (C++) or Vec<T> (Rust) would be annoying to write in C as soon as it is needed for more than one type T (very likely in any code base of a moderate size). A typical way to do this in C is to write a single version that stores void* pointers, but that forces more allocations and adds indirection (increasing cache pressure) (oh, and loses type-safety, increasing the scope for bugs), another typical way is to use a linked list rather than a vector, which has the same problems and more. This is even more visible with more complicated data structures like associative maps like trees and hashmaps (Rust uses a fairly fancy algorithm internally), and then there's the whole next level of optimisation/complexity with concurrent data structures (e.g. using atomics correctly with maximum performance is hard).

(The last two mean that even if the core language performance, whatever that means, of C++/Rust is slower than C, real-world code in the former runs faster than real-world C because it is much easier to get fast code.)

That said, one can still compare existing benchmarks as some sort of vague upper bound on the possible future performance of the language/compilers/standard library.

Those are mostly intended as a general warning against summarising language performance so briefly, but a few of those points are points in Rust's favour (e.g. abstractions, the performance of Fortran the language is driven largely by controlling aliasing which Rust does too, zero-cost FFI), and a few are currently against (e.g. no Rust compiler with a GCC or Intel backend, fewer specialised optimisations in the standard library).

Rust aims to have the same philosophy as C++: don't pay for what you don't use. I.e. if we add a new feature X that has a certain cost, and you don't use that X then you shouldn't see that cost at all.

12 Likes