Why is a rust executable smaller than Kotlin-native?

Considering both Kotlin-native and Rust are based on LLVM, I expect there output and performance to be close!

So created 2 simple Hello World! with kotlin and rust

main.kt:

fun main() {
    println("Hello, world!")
}

main.rs:

fn main() {
    println!("Hello, world!");
}

Then generated the executable files for bot using:
kotlinc-native main.kt for kotlin and cargo build for rust

Then checked the executables/binary using the below:

ls -S -lh | awk '{print $5, $9}'

and found that the file generated by kotlin native is 1.48X the file generated by rust.

Any idea why this variance!

enter image description here

Kotlin is still garbage collected, right?
My guess would be it's because of the bigger runtime.

4 Likes

As an aside: file sizes on trivial programs aren't terribly interesting, unless you're talking about multiple orders of magnitude difference, or you're trying to target a pathologically space-constrained system.

That they both use LLVM is also irrelevant.

13 Likes

Correct, Kotlin still GC.

BTW, for "hello world" executable there's still lots of unused code in the executable. If you care about exectuable size, you should:

  • build with --release
  • enable LTO
  • switch to system allocator
  • strip the executable (due to some bugs, even in release mode, Rust may put megabytes of debug symbols)
9 Likes

Thanks.
So the executable can be even smaller than what I reported!
Do you have reference links for what you mentioned, LTO, system allocator, executable stripping.

I already used '--released' forgot to mention it in my post.

I think the expectation about performance is here is probably unfounded. They both are compiled with LLVM, yes, but how runtime and object model works matters a lot, probably more than what particular codegen backend is used.

For purely number-crunching applications, when you don't do any allocation and just do a lot of math (not explicitly vectorized), the performance indeed should be similar.

However as soon as you start using allocation, "objects" and standard library, the differences in runtime, memory management, allocation patterns should make a big difference, and Kotlin and Rust are very different in these respects.

Also, I would naively expect that Kotlin JVM would be faster than Kotlin/Native for typical workloads: Kotlin's object model is basically the Java object model, and JVM is very optimized for dealing with it.

5 Likes

Initial setup:

$ cargo new hello_world

Build with:
$ cargo build

:point_right:t2:589,004 bytes

Optimization Step 1:

Build with:
$ cargo build --release

:point_right:t2:586,028 bytes

Optimization Step 2:

Change contents of main.rs to:

use std::alloc::System;

#[global_allocator]
static A: System = System;

fn main() {
    println!("Hello, world!");
}

:point_right:t2:335,232 bytes

Optimization Step 3:

Add …

[profile.release]
lto = true

to Cargo.toml.

:point_right:t2:253,752 bytes

Optimization Step 4:

Strip executable via …
$ strip target/release/hello_world

:point_right:t2:177,608 bytes

(Note: Optimization Step <N> here makes use of Optimization Steps 1..<N - 1> as well.)

Savings

24 Likes

Step 5:

You should also add opt-level = "z" to Cargo.toml

Step 6:

To drop the size even further, you can use Xargo to build the Rust stdlib from source when you build the application, which allows you apply lto and opt-level to the stdlib build, stripping out lots of the stdlib you don't need.

13 Likes

I've been meaning to try this for some of my performance-critical crates. I would think that it might speed up cases of many Vec calls.

What the easiest way to do this? Do I need to list every stdlib crate that I want recompiled in my Cargo.toml, or is there a single option to Xargo that will do it? (Target CPU is just my host CPU)

1 Like

I did not understand this statement my friend.

By changing the main.rs + Cargo.toml as you recommended, and using Cargo build --release, the size I got is: 238K which is less than the 253K you mentioned, any idea what could be the reason for this variance in numbers!

After using strip additional to the above, the final file size is: 169K

1 Like

Different architecture and/or different versions of Rust and/or different versions of LLVM and/or …

6 Likes

He means every optimization step depends on preceding steps, unless I misunderstood.

1 Like

And is that true?

In his figures, every step includes the previous steps recursively.

I'm not sure if there are any technical dependencies between those optimizations.

This is what I meant, yes.

Neither do I.

I also tried the new size optimization option of rustc, but the size was the same as with strip.

You'll probably need a more realistic example than only printing Hello World to see a difference. In this example, a large majority of the code is probably static linking the stdlib. You'd need Xargo to help make that smaller by building from source.

What is the format/code for this option?

See "Compiler" in the 1.28.0 release notes:

2 Likes