My immediate guess is that its because the Rust code will spend time allocating space for, and copying string data into the fields, whereas the Java code just stores a String object that already exists before the loop starts running.
then JVM is not slower than LLVM and sometimes faster
at least for something like allocation and processing over data ( my means not math) or maybe can write faster code without maximize hard coding and advance optimization in Rust
I mean, in this particular case, the code is simply not doing the same thing, and Java happens to be doing less work.
That said, it is true that there are cases where the JVM will be faster. In particular, the JVM tends to be faster at making a large number of small heap allocations.
Mostly because the Rust code does more works. What they're doing underneath are:
Rust
Repeat step 2 ~ 8 3 times.
Record the current time.
Allocate heap memory of 100 * 56 bytes.
Repeat step 5 100 times.
Allocate two heap memories of 5 bytes and fill them with fixed value "frame".
The iterate-and-validation part will likely be cleared out as an optimization, since it doesn't produce any observable behavior.
Print the time passed since step 2.
Free all the memories allocated in step 3 and 5.
Java
Repeat step 2 ~ 8 3 times. I'll ignore the warmup part here.
Record the current time.
Allocate GC heap memory of 100 * 16 bytes(for the array itself, Java references are fat) and few bytes(for the ArrayList instance itself).
Allocating on GC heap is very cheap, especially on single thread. Normally it's just bumping global offset.
Repeat step 5 100 times.
Allocate GC heap memory of few bytes(for the CClass instance), and fill it with known pointer values pointing String instances the JVM pre-allocated for those string literals.
I'm not sure the JVM optimized out the iterate-and-check part, but it's only 100 elements so it should be cheap.
Print the time passed since step 2.
The process terminates quickly, means it's likely the GC never runs. This saves the free time.
The critical part is that allocating on GC heap is really cheap. If most of your task is to allocate memory, using GC language would be more performant. Many real world example do more than just allocation which may reduce this benefit.
In your case, the issue has to do with Java having garbage collection and copying references.
In Rust, you need to be clear whether your structures want to own or borrow the elements. In your case, the structure contains an owned version of str which means having spending time to copy the contents of string literal to new allocation every time you create a new instance.
You can avoid this cost by storing &str within a structure rather than String.
fn main() {
for _ in 0..3 {
let now = std::time::Instant::now();
// ===============================================
let mut v = Vec::with_capacity(100);
for i in 0..100 {
v.push(Class::new("fname", "fname", i % 30));
}
for i in 0..100 {
let cls = v.get(i).unwrap();
cls.validation();
}
// ===============================================
let now2 = std::time::Instant::now();
println!("==> {}", now2.duration_since(now).as_nanos());
}
}
struct Class<'a> {
fname: &'a str,
lname: &'a str,
age: i32,
}
impl<'a> Class<'a> {
pub fn new(fname: &'a str, lname: &'a str, age: i32) -> Self {
Class { fname, lname, age }
}
pub fn validation(&self) -> bool {
self.age > 0
}
}
thanks for your answer ,
then why in some java software use off-heap to avoid GC heap ??
ok it make Garbage but increase performance on allocating in many cases
It depends, on performance you should only trust benchmarks. But note that the pause time is not the only perf stat the GC have. Nowadays it spends a lot more time in parallel. CPU is a shared resource, using more of it in GC thread would reduce the process' throughput. Also, 10ms is not instant - your benchmark above prints less than 0.01 ms. Intervals between GC pause would also matter - it's useless if it pause for 10ms every 1ms.
10 milliseconds is pretty darn long considering that 1. today's computers have clock cycles and instructions that work at the nanosecond scale, and 2. 10 milliseconds is already too slow if you need e.g. real-time 120 FPS refresh rate in a game (that would be around 8 milliseconds for all the computation for a single frame including GC pauses).
There are real-time algorithms for GC that deamortize the cost of collection. I.e. every time you do an allocation, do a little bit of tracing and collecting, with work proportional to the amount of new memory you're allocating. All the GC languages should switch to doing that.