Is Rust really faster than Java because of memory management?

I am a long-time Java programmer and have recently become intrigued by Rust.

One claim that I see repeated in a lot of websites about Rust is that Rust will be "blazingly fast" compared to Java because of how it does memory management. However, it seems to me that in many cases that matter, Java and Rust do very similar things:

  • Both Java and Rust put primitives on the stack, which is cleaned up trivially by moving the stack pointer.
  • Rust can put complex objects on the stack, but as soon as a collection is involved, the "meat" of the complex object will live on the heap anyway.
  • Rust releases the heap memory for an object at the end of its lifetime, whereas Java uses garbage collection at an unspecific time.

I can see that Rust's management of the heap may be more memory-efficient (don't need to allocate extra workspace for the GC) and the time investment for allocating and releasing heap memory may be more distributed, but will a Rust program really spend less total time on heap management over the course of a complete run than the comparable Java program?

1 Like

The root difference regarding performance is that in Rust not all structs are placed in its own location. For example if I have HashMap<MyKey, MyValue> collection of length N, in Java it has at least 2N + 1 allocations while in Rust only single allocation is enough. Benefits from it is not only to reduce the amount of allocations tracked, but more about to reduce cache misses significantly.

Another benefit from it would be the zero cost abstractions. For example modern Java has Optional<T> type, but using it everywhere is discouraged as it hurts performance a lot. But in Rust having Option<T> is usually faster than having two fields bool and T.

14 Likes

Often, but not always. Java is faster at certain workloads - for example if you allocate a large number of really small objects, then Java is likely to be faster at doing because of how its memory management works. However, in those cases where Java makes a lot of allocations, you can often make much fewer allocations in Rust than you will in Java, and doing nothing in Rust is faster than allocating memory in Java.

Also, code that just does a bunch of computations on preallocated arrays of primitives will probably be about the same speed due to Java's JIT compilation.

17 Likes

In general, and from memory, people that have done this sort of analysis show that:

  • GC languages use more memory (even only counting live memory) - sometimes significantly - so you're at a higher risk of thrashing, which is far worse in GC
  • long lived processes often show higher throughout in GC languages (not massively, on the order of 10%)
  • short lived processes are often massively slower in GC languages (on the order of 10x depending on a lot of factors)
  • Most notable for most cases is the lack of consistent performance.

I'm on my phone at the moment, so I don't have references right now, sorry

6 Likes

In short, yes.

Long answer:
To see why, let's see how Rust vs (some general version of) GC works.

In Rust, like you mentioned there is allocation on the stack as well as on the heap. Generally speaking you have fine-grained control over precisely when de/allocation happens, which is useful in eg constrained environments like embedded systems, but also on servers where RTT matters a lot. There are plenty of blogs from larger companies that rewrote some of their services in Rust from eg Golang precisely because of sudden latency spikes incurred by Golang's GC - and golang has a GC that seems to do a relatively good job.
Then there is the expectation of memory management. With Rust people generally expect to manually manage their memory and are pleasantly surprised by the actual status quo, where more often than not the default properties re:de/allocation are perfectly suitable, which in turn drastically reduces the actual amount of work on that front.

Then there are GC languages eg Java or Golang.
All the same kinds of allocations that Rust does are done in those languages as well. You have fine-drained control over allocation, but not over deallocation, leaving that to a runtime GC algorithm.
And that brings us to the biggest technical difference: the runtime GC algorithm. You see, on top of all those allocations and deallocations, an algorithm is executing at runtime parallel to you your code that decides when to release memory that programmers have made inaccessible.
This carries not only a memory burden (how large this burden is seems to be improving, but you will always need > 100% of the RAM that an equivalent Rust program would), but also one in runtime CPU cycles, to execute the GC algorithm (eg mark-and-sweep).
Then there's the programmer perception that they don't need to pay attention to memory on such languages, which never works out for systems that are complicated enough.

5 Likes

Thanks for the fast and informative answers! So let me try to summarize:

  • Rust collections lay out their elements in a contiguous area of the heap, whereas Java collections only store references to elements that may be scattered across the heap. This improves cache locality and reduces the number of object allocations that are needed. (Edit: This applies not only to collections, but also to other data types including structs.)
  • Rust makes the decision on when each object can be deallocated at compile time, whereas Java's GC spends time at runtime to make such decisions.

Does that capture your main points?

1 Like

Yes, but it's not just collections but types that contains other types in general including class/struct.

3 Likes

There is a lot which can be said about the performance differences, but I would like to push back on the premise of the question. A blanket statement like "Rust is faster than Java" is, in the general case, meaningless, almost certainly false and actively harmful. Rust can be significantly faster under certain workloads, and significantly slower under certain workloads. An indirection-heavy code with dynamic and very uncertain object lifetimes running in a long-lived process will very likely be faster in Java. It's just that the structure of Rust steers you away from writing the kind of code which will be slow, and if you can fit that mold (or if you are willing to put effort into architecture and performance optimizations) your program will be significantly faster. But as they say, you can write Java in any language. If you just blindly copy the Java patterns, papering over the lifetime issues with a lot of reference counting and locks, then your Rust code can easily be significantly slower than equivalent Java, which optimizes for that use case and has semantics that allow greater optimizer freedom.

9 Likes

I think you actually missed the important point: main different lies not with languages, but with people.

In reality you can write Java code which would be, very often, faster than Rust. Just put all the object in one large array and use indexes to manage data. And JIT would do amazing job.

But that's extremely unidiomatic. The ideomatic way is factory factory factory pattern. And Dependency Injection. And lots of overhead. I have never seen anything like RequestProcessorFactoryFactory.

Because Rust makes you cares about memory and that shows. But no, there are no magic in Rust. Carefully written and thought-out code in Java can be as fast or faster than Rust. Only no one very few write carefully and thoughtfully in Java. While in Rust that's the norm.

1 Like

Hey, that premise wasn't my premise - it is what literally half of the Rust tutorials I found start with.

I don't doubt that you can write bad or unidiomatic code in any language. What I wanted to understand is in what exact way Rust memory management differs from Java's (for realistic usecases) that would make optimal Rust code potentially faster than optimal Java code. I'm not interested in starting a religious war - I'm trying to ask a technical question and I'm looking for technical answers.

At the same time, Rust invites you to pass a lot of data by value, which can cause copying that wouldn't happen in Java. Is this not a performance concern?

Copying small pieces of data is super cheap. Most of it will be optimized away by the compiler anyway. Chasing references is quite expensive, it breaks all assumptions that the modern hardware uses for optimizations: branch prediction, caches, data locality, SIMD.

Java JIT goes to great lengths to eliminate pointer chasing wherever it can.

It was a performance concern when Java was developed. Fastest CPU back then was Pentium 100MHz. Single-core. And PC100 SDRAM latency was 20ns.

Today we have CPUs which go up to 4-5GHz, have 4-16 cores and our DDR5 fastest RAM have latency of… 10ns.

Relative price of copying vs referencing have changed radically. It was gradual process. What Every Programmer Should Know About Memory was written more than 10 years ago.

But today… we live in a world where a single CPU core can do approximately thousand simple operations in the time where it waits for one, single “unlucky” memory access.

Sure, not every time you follow a pointer you end up “unlucky”. There are three (sometimes four) level of caches and if you accessed that pointer recently it would, probably, be cheap… but if that's the case then chances are high you would access that same pointer yet again which means win from copying (where you don't need to access pointer at all) increases, too.

Note that it's fundamental issue, dictated by physics, unless we would invent some kind of short-distance warp drive fast copying, slow pointer chasing is with us to stay.

The guys who invented OOP and Java are not idiots. They just lived in a different time and solved different problems.

4 Likes

It's interesting that the tutorials you used stated that, while the response here has mostly been that unqualified statements like "X is more Y than Z" are unaccurate at best.

1 Like

Given that some of the people who respond here probably wrote the tutorials, I should probably be more precise and say "literally half the 'I love Rust because ...' pages", not necessarily the tutorials themselves. I do get the sense that there is a lot of excitement about Rust out there that is not always backed up by technical depth, and I'm really glad to have gotten such useful answers here.

I think there's a few different questions in play here:

  1. Is Rust "faster" than Java?
  2. Does Rust manage memory faster than Java?
  3. Is the way Rust manages memory responsible for the performance improvement over Java?

I think it's really hard to give a confident answer to the first question as it is so broad. Even more importantly, let's say Rust is faster than Java for the average program, but what you probably care about is "is Rust faster than Java for my program?". Even deciding what "faster" means is difficult: do you care about startup/warmup costs associated with Java? If you're writing a long-lived web server, probably not. If you're writing Amazon Lambda/Azure Functions you definitely do. Do you care about memory consumption? Is your program running on a 128-core 1TB RAM server or a 40mhz microcontroller with no allocator? The details of your particular situation matter! In general, I would say that Rust is often faster than Java but there are absolutely cases where the opposite is true. As @alice pointed out, most Java GC's allocators are extremely efficient and any kind of benchmark that forces lots of allocations will probably favor Java over Rust.

Leading into the second question, I would say that modern JVMs typically have very efficient allocation and deallocation strategies and are probably on average faster than your OS's libc allocator (which Rust uses by default). However, Java programs tend to allocate much more heavily than Rust programs do. In addition, you also pay other costs associated with your GC such as having a background thread that periodically runs to garbage collect as well as write barriers that spread the cost of having a GC all over your program. In short, in my opinion, Java tends to be more efficient at managing heap memory but Rust tends to use (much) less heap to begin with.

Finally, I'm not really sure I would attribute any performance advantage Rust has over Java to differences in memory management (especially heap like you pointed out). The following all seem like much bigger differences to me:

  • Rust is ahead of time compiled with a production grade optimizer (LLVM). Java is (typically?) JIT compiled with a tiered interpretation/JIT system. In order to get good performance, you need to warm your Java program up and give the JIT time to optimize it. Rust gives you good performance as soon as your binary runs.
  • Java encourages dynamic dispatch both in the language (nothing is final by default) as well as the ecosystem (DI, programming to interfaces not implementations, etc) which has a cost and is more difficult to optimize. Rust encourages static dispatch which is easy to optimize as well as easier on your CPU's branch predictor.
  • Rust gives you fine grained control over both memory layout (you control when heap allocations occur as well) and code (you can write inline assembly to get exactly the instruction sequence you want). AFAIK, Java doesn't give you either.
  • Java is designed to insulate you from details of the underlying platform (what OS your program is running under, what architecture the machine is, how big pointers are, what size atomics are available, etc) while Rust's std library makes it easy to write cross-platform programs but never hides the underlying platform from you.
  • Java doesn't have user-definable value types (yet). Any idiomatic Java program is going to spend a lot of time chasing pointers and, on modern computers, that's one of the most expensive things you can do. Rust strongly encourages you to keep your data together in memory and to not create huge object graphs that result in your CPU mostly just waiting for memory.
16 Likes

There's a case where Java's approach to this can be superior: Java's tracing JIT can compile away dynamic dispatch based on which types are actually used in practice. So, in a problem where dynamic dispatch is necessary and frequent — or at least, used in both Rust and Java implementations of the program — Java might yield better performance. Effectively, the JIT can use the program's data/state as an input to the compilation process, not just its source code — so for example it might succeed in JIT-compiling an interpreter's interpretation of an expression tree that was provided to it at run time.

Caveats:

  • This does require the VM to “guess correctly” about what your program will be doing in the future. (But so does optimization during ahead-of-time compilation, unless it's profile-guided optimization.)
  • The kind of program which benefits from this tracing and dynamic dispatch elimination is likely to be the kind of program which has lots of references and thus require lots of pointer-chasing to execute, which hits the memory performance limits already discussed.
6 Likes

I can write fast GC-free Java. Namely Byte Buffers in a pool (I did that for an image processing pipeline). You can also compact strings into large byte arrays and maintain offsets in an int-array. You can use fixed len 'struct' blocks within a Byte Buffer to emulate a C array. Pass serialized binary json structures around (appending to end or in-place mutating ints).
Also, as other have stated pure float array processing is vectorized in Java, just like llvm/gcc. So not much faster than that.

However, to do most of these hacks in Java treats it like an inferior C. You no longer have all the normal Java goodness (like field offsets) . You wind up using Unsafe more than you'd like.

In Rust, we get C struct arrays(better with enum arrays/Option arrays) . We get contiguous HashMap and BtreeMap (unlike C/C++). We get in-stack of 100% of owned struct in function call parameters (if it ain't a box, it's on the stack). So zero alloc/free overhead thus far.

The ONLY time you hit malloc is Vec, Box, Rc, Arc. If you are writing linked lists or trees in rust - find a different solution. But you can create in-stack arrays and pass mutable slices around (if you keep under 1MB). So 1 heap object max per parameter (most Java things have dozens, hundreds or millions of associated heap objects)

The Vec is only really an issue with strings. Because this, in my opinion is the hardest allocation strategy in any language. Java has a minor leg up here because most strings are transient. Unless! You have a hundred million long lived strings in Java (think XSLT transforming web server from the 90s). (that brings back PTSD- Major GC every 5 seconds). Here rust might win out - because each string gets moved a dozen or so times (after each major GC).
Even though rust needs a heap malloc and free, Java has to maintain FOUR allocations for the life cycle of a string. The transient CharEncoder object instance, the source byte array, the destination char array (possibly needing to be copied if utf8 size estimation was too small), and the String wrapper object. Rust just needs pre reserved stack and one dynamic heap allocation. (with no scan if you use unsafe - the safe version MIGHT perform a second copy under the sweet Cow)

One of my favorite features of rust was the string slice capability. You can kind of do it in Java but it creates more heap objects and may keep the parent char array alive too long. For parsers this (rust capability) was the BOSS for me. (nom is a good example, but I tend to write my own, because it's so simple - zero copy baby).